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Abstract 


The  second  U.S.  Army  Conference  on  AppUed  Statistics  was  held  23-25  October  1996  at  the 
Monterey  Beach  Hotel,  Monterey,  CA,  and  hosted  by  the  TEXCOM  Experimentation  Center  at 
nearby  Fort  Hunter  Liggett.  The  conference  was  cosponsored  by  the  U.S.  Army  Research 
Laboratory,  the  U.S.  Army  Research  Office;  the  U.S.  Military  Academy;  the  U.S.  Army  Trainmg 
and  Doctrine  Command  Analysis  Center,  White  Sands  Missile  Range;  the  Walter  Reed  Army 
Institute  of  Research;  and  the  National  Institute  for  Standards  and  Technology.  Papers  given  at 
the  conference  addressed  the  development  of  new  statistical  techniques,  application  of  existmg 
methodologies  to  Army  problems,  and  panel  discussion  of  statistical  challenges  in  ari  Army 
setting.  A  special  session  was  included  to  commemorate  Fort  Hunter  Liggett,  the  dedicated 
civilians  and  military  who  have  worked  there,  and  the  countless  contributions  to  Army  testing 
that  were  developed  and  practiced  there.  This  document  is  a  compilation  of  available  papers 
offered  at  the  conference. 
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FOREWORD 


The  second  U.S.  Army  Conference  on  Applied  Statistics  was  held  23-25  October  1996  at  the 
Monterey  Beach  Hotel,  Monterey,  CA  and  hosted  by  the  TEXCOM  Experimentation  Center  (TEC) 
at  nearby  Fort  Hunter  Liggett.  The  conference  was  cosponsored  by  the  U.S.  Army  Research 
Laboratory  (ARL);  the  U.S.  Army  Research  Office  (ARO);  the  U.S.  Military  Academy  (USMA);  the 
Training  and  Doctrine  Command  (TRADOC)  Analysis  Center,  White  Sands  Missile  Range 
(WSMR);  the  Walter  Reed  Army  Institute  of  Research  (WRAIR);  and  the  National  Institute  for 
Standards  and  Technology  (NIST).  The  U.S.  Army  Conference  on  Applied  Statistics  is  successor 
to  the  U.S.  Army  Conference  on  the  Design  of  Experiments,  an  historic  series  of  meetings  that 
formally  concluded  in  1994  after  forty  years  of  service  to  the  Army.  Today’s  Army  faces  challenges 
that  are  far  ranging  and  encompass  many  topics  in  which  probability  and  statistics  have  a 
contribution  to  make,  in  addition  to  experimental  design.  This  new  conference  reflects  a  broadening 
of  scope  with  the  goal  to  promote  the  practice  of  statistics  in  the  solution  of  diverse  Army  problems. 

The  second  conference  continued  in  this  new  direction.  Toward  statistical  education,  the  conference 
was  preceded  with  a  short  course,  “Quality  Control:  Modeling  the  Denting  Paradigm,  given  by 
Prof.  James  R.  Thompson  of  Rice  University.  Distinguished  speakers  from  academia  spoke  during 
invited  general  sessions:  Prof.  C.  R.  Rao,  Penn  State  University:  Prof.  Ulf  Grenander,  Brown 
University;  and  Prof.  Rob  Kass,  Camegie-MeUon  University.  A  special  session  was  included  to 
commemorate  Fort  Hunter  Liggett,  the  dedicated  civilians  and  military  who  have  worked  there,  and 
the  countless  contributions  to  Army  testing  that  were  developed  and  practiced  there.  As  the  program 
included  will  indicate,  many  prominent  individuals  participated  in  this  special  session.  The 
conference  was,  however,  especially  pleased  to  welcome  the  Honorable  Philip  E.  Coyle,  Director, 
Defense  Directorate  of  Operational  Test  and  Evaluation  (DOTE),  Office  of  the  Secretary  of  Defense 
(OSD);  Mr.  Walter  W.  Hollis,  Deputy  Undersecretary  of  the  Army  for  Operations  Research  (DUSA- 
OR);  Prof.  Herman  Chemoff,  Harvard  University;  Dr.  Ernest  Seglie,  Science  Advisor,  DOTE;  and 
Dr.  Marion  Bryson,  Former  Director  of  the  Combat  Developments  Experimentation  Center  (CDEC). 
The  conference  was  completed  with  contributed  sessions  where  talks  developed  new  methodology , 
detailed  successful  applications,  or  requested  guidance  from  a  panel  of  experts  in  attacking  an  Army 
problem  that  had  resisted  standard  statistical  approaches. 

The  Executive  Board  for  the  conference  recognizes  Drs.  Douglas  Tang,  WRAIR,  and  Mark  Vangel, 
NIST,  for  assisting  with  conference  details;  Dr.  Barry  Bodt,  ARL,  for  general  conference 
administration  and  proceedings;  and  Dr.  Carl  Russell,  TEC,  for  hosting  the  conference  and  handling 
all  local  arrangements.  Special  thanks  is  due  to  Mrs.  Patricia  Winters,  TEC,  who  served  as  site 
coordinator  for  the  conference. 


Robert  Burge  (WRAIR) 
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One  of  die  U.S.  Army  Research  Laboratory’s  (ARL's)  Science  and  Technology  Objective 
(STO)  research  projects  is  to  develop  standardized  field-qierational  soldier  performance 
metrics  to  quantify  integrated  soldier-information  sj^m  performance  on  the  digital 
battlefield  This  research  effort  is  inteided  to  help  the  Army  leadership  assess  the  iirqiact 
of  digitization  on  individual  soldier  and  staff  performance.  These  measuronent  scales 
directly  support  the  Joint  Vaiture  Axis  Five  and  Seven  and  Rolling  baseline  assessment  of 
Higita]  information  syston  technology  during  Advanced  Technology  Demonstrations, 
Advanced  Warfighting  Ejqieriments,  and  related  Force  XXI  and  Army-After-Next  field 
activities. 

In  ccmjuncticHi  with  this  project,  ARL  siqiported  the  Battle  Command  Battle  Laboratory 
(BCBL)  and  the  TRADOC  Analysis  Center  (TRAC)  in  studying  Battlefield  Visualization 
issues  during  the  Prairie  Warrior  96  exercise  (PW  96).  Specifically,  ARL's  enqihasis  was 
on  the  Maneuver  Ccxitrol  System/PhoCTix  (MCS/P)  beta  Battlefield  Operatiai  Systems 
(BOS)  software  that  was  designed  to  enhance  the  Mobile  Strike  Force  (MSF)  soldier  and 
ctfifF  performance  during  the  exercise  by  providing  a  clear  understanding  of  the  current 
state  of  a  battlefield  situatioa  with  relatirai  to  the  en^y  and  ^vironment. 

The  p^er  specifically  describes  efforts  to  define  and  measure  soldier  MCS/P  informaticai 
bitarfece  fimcrionality  and  usabilify.  The  report  includes  lessons  learned  fi‘om  PW  96  and 
describes  how  the  evaluation  methods  and  metrics  were  developed  and  inqrroved  to 
produce  an  evaluatirsi  package  ftiat  can  be  use  in  other  Advanced  Warfighting 
Erq>eriments  (AWEs),  Command  Post  Exercises  (CPXs),  and  sirrtulation  exercises. 
Results  of  file  behaviotally  anchored  rating  scale  and  usability  index  administered  to  the 
MSF  during  PW  96  are  presorted. 

Keywords  Prairie  Warrior  96,  MCS/P,  performance  metrics,  bdiavior  anchor  scales, 
soldier  syston  interfoce 

1  Introduction 

The  U.S.  Army  Research  Laboratory  (ARL)  srqrportedthe  Battle  Command  Battle  Laboratory 
and  the  TRADOC  Analysis  Center  (TRAC)  in  studying  Battlefield  Visualizaticm  issues  during  the 
Prairie  Warrior  96  exercise  (PW  96).  Specifically,  ARL's  enqrhasis  was  on  the  Maneuver  Control 
System/Phoenix  (MCS/P)  beta  Battlefield  Operation  Systems  (BOS)  software  that  was  designed  to 
enhance  fire  Mobile  Strike  Force  (MSF)  soldier  and  staff  performance  during  the  exercise  by 
providing  a  clear  understanding  of  the  currait  state  of  a  battlefield  situaticm  with  relaticxi  to  the 
enemy  and  aiviraiment.  The  soldier  and  staff  MCS/P  interface  was  assessed  by  ARL  throu^ 
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the  administration  of  andiored  rating  scale  questionnaires  to  the  MSF  participants  and 
observations  during  SIMEX I  and  PW  96.  In  this  study,  we  measured  digital  effects  in  terms  of 
attitude  change,  behavior  change,  command  staff  task  performance,  and  soldier-computer  interface 
effectiveness. 

Specifically,  quantitative  psychometric  methods  were  used  in  the  developmait  of  behaviorally 
anchored  rating  scales  and  standardized  task  performance  metrics  to  evaluate  integrated  staff  and 
soldier  informaticm  system  interface  performance  on  MCS/P.  The  rating  scale  methodology  used  a 
five-point  Likert-type  scale  to  quantify  MCS/P  functionality.  These  metrics  addressed  critical 
functicHial  dimensions  of  staff  performance  within  the  Deliberate  Decision  Making  Process  that 
included:  (1)  Mission  Analysis  (2)  Course  of  Action  (COA),  (3)  Informaticm  Assimilation,  (4) 
Generation  of  Messages  and  Rqjorts,  (5)  Workload  Distributicm  and  (6)  Develc^ment, 
Distribution  and  Maintenance  of  Situation  Awareness. 

To  study  and  inq)rove  soldier-ccwnputer  interfece  software  design,  a  heuristic  evaluaticai  was 
administered  to  the  MSF.  This  evaluaticai  used  a  usability  index  developed  by  ARL  for 
measurement  that  focused  on  important  soldier  MCS/P  interfece  design  issues  involving  such 
characteristics  as  speed,  utility,  flexibility,  consistency,  intuitiv^ess,  feedback,  demand  on 
memory,  error  recovery,  and  &tigue.  These  principles  are  based  on  human-system  interface 
research  outlined  by  Mohch  and  Nielsen  (1990).  This  index  also  used  a  five-point  Likert-type 
scale.  The  paper  specifically  describes  efforts  to  define  and  measure  soldier  MCS/P  information 
interfece  functicaiality  and  usability.  The  report  includes  lessons  learned  fi'om  PW  96  and 
describes  how  the  evaluation  methods  and  metrics  were  developed  and  improved  to  produce  an 
evaluation  package  that  be  transition  for  use  in  other  Advanced  Warfi^bting  Experim^its 

(AWEs),  Command  Post  Exercises  (CPXs),  and  simulaticm  exercises.  Results  of  the  behaviorally 
anchored  rating  scale  and  usability  index  admmistered  to  the  MSF  during  PW  96  are  presented. 

2  Prairie  Warrior 

Command  and  General  Staff  College(CGSC)  designed  the  exercise  to  provide  the  Command  and 
General  Staff  Officers  Course  (CGSOC)  studesits  with  an  experience  similar  to  a  Warfi^er  and 
provide  an  opportunity  to  execute  decisicm-making  processes.  Operations  in  a  joint  and 
multinaticmal  ravircmmait  were  simulated. 

The  Command  and  General  Staff  Officers  course  A308,  Battle  Command  Elective, 
provided  fee  staff  and  systems  training  for  fee  MSF.  This  included  classroom  instructicm  in  MSF 
concerts  and  tactics,  techniques,  and  procedures  (TTP);  hands-on  training  for  MCS/P;  two 
simulaticm  exercises  (SIMEXes)  and  fee  final  exercise  Prairie  Warrior  96  (PW  96). 

As  stated  in  fee  PW  96  Final  Report  (1996),  "principal  imits  (located  at  Fort  Leavaiworfe, 

Kansas,  unless  otherwise  noted)  included  a  Combined  Joint  Task  Force  (CJTF);  Combined  Forces 
Compcm^  Commanders;  a  Theater  Support  Command  (TSC),  represented  by  fee  310fe  Theater 
Army  Area  Ck>inmand  (TAACOM)  cperating  firom  Fort  Lee,  Virginia;  a  student-led  cmrps  and 
subordinate  U.S.  and  multinaticmal  divisions;  a  studait  led  MSF;  a  student-led  Marine  Air  Ground 
Task  Force;  Analysis  and  Control  Elements  (ACEs)  staffed  by  Military  Intelligeoce  Officer 
Advanced  Course  (MI  OAC)  studoits  at  Fort  Huachuc:a,  Arizcma;  Analysis  and  Ccmtrol  Teams 
(ACTs)  staffed  by  MI  Officer  Basic  Course  (OBC)  students;  and  a  Synchronizaticm  Cell, 
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operating  from  Maxwell  Air  Force  Base,  Alabama."  TTie  MSF  used  advanced  systems  with 
potential  2010  technology. 

2.1  Maneuver  Control  System/Phoenix  (MCS/P)  (beta). 

The  MCS/P  (beta)  was  the  central  digitized  platform  used  in  PW  96.  It  was  a  prototype 
computerized  battle  command  system.  This  system  provided  a  common  picture  of  the  battlefield 
overlaid  on  Defense  Mapping  Agency  (DMA)  digital  maps.  There  was  capability  inherent  in  the 
system  to  synchronize  the  battle  plan  based  on  the  assessment  or  presentation  of  near-real-time 
information  and  assessments  from  staff  and  subordinate  commanders.  MCS/(P)  had  the 
capabilities  of  conveying  current  information  about  location,  strength,  and  other  pertinent 
information  for  both  fiiendly  and  enemy  forces.  A  total  of  56  MCS/Ps  were  used  in  PW  96  by  the 
MSF,  25  of  them  in  the  NSC  and  the  remainder  in  the  Leadership  Development  Center  (LCD). 
This  cortunand  and  control  system  had  the  following  capabilities: 

1 .  Receive  enemy  and  fiiendly  feeds  5.  Build  overlays 

2.  Build  and  manipulate  databases  6.  Operate  plarming 

3.  Generate  and  display  reports  7.  Wargaming 

4.  Create  Situational  Awareness  8.  Send  &  receive  information  and  briefs 

The  distribution  of  MCS/Ps  and  other  systems  in  the  MSF  is  shown  in  Figure  1 . 


Figure  1:  Battlefield  Operating  Systems  Used  in  PW  96 
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Seventeen  systems  were  included  in  the  PW  96  Army  Tactical  Command  and  Control  Systems 
(ATCCS).  They  included;  (1)  Maneuver  Control  System/Phoenix  (MCS/P),  (2)  Common  Ground 
Sensor  (CGS),  (3)  Intelligent  Mine  Field  (IMF),  (4)  Advanced  Field  Artillery  Tactical  Data  System 
(AFATDS),  (5)  Combat  Service  Support  Control  System  (CSSCS),  (6)  All  Source  Analysis 
System  (ASAS),  (7)  Map  Fax  System  (MAPFAX),  (8)  Onyx  Graphics  System  (ONYX),  (9) 
Forward  Area  Air  Defense  Command,  Control,  and  Intelligence  (FAAD  C2),  (10)  Army  Airborne 
Command  and  Control  System  (A2C2S),  (1 1)  Terrain  Evaluation  Module-Engineer  Operations 
S3^emfrEM/E-OPS),  (12)  Downsized  Ground  Control  Station  (DGCS),  (13)  Voice  Activation 
(PC  Voice  10),  (14)  Windows  Desktop  Display  (WINDD),  (15)  Corps  Battle  Simulation  (CBS), 
(16)  Combat  Service  Support  Training  support  Simulation  (CSSTSS)  and  (17)  Knowledge  Based 
Logistics  Planning  Shell  (  KBLPS/  Log  Anchor  Desk). 

2.2  Digital  Network  Architecture. 

Both  the  National  Simulation  Center  and  the  Leadership  Development  Center  contained 
elements  of  the  MSF  and  the  Bell  Hall  operation  contained  the  n  Corps.  The  buildings 
were  interconnected  with  high  capacity  data  lines  and  intercormected  with  data  to  simulate 
tactical  data  communications  (see  Figure  2). 


Figure  2:  MSF  and  11  Corps  Element  Network 


2.3  Overall  Staff  Organization 
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The  conunander  exercised  command  and  control  of  the  MSF  through  two  distinct  and  separate 
tactical  operations  carters  (TOCs  A  and  B),  the  Combat  Information  Center  (CIC),  and  the 
Analysis  and  Control  Element  (ACE).  Additionally,  an  ad  hoc  TOC  charged  with  reconnaissance, 
intelligence,  surveillance,  and  target  acquisition  (RISTA)  tasks  evolved  from  the  military 
iirtelligence  battalion  headquarters.  TOC  A  focused  on  currait  operations  and  was  employed 
forward  on  the  battlefield,  vhile  TOC  B  focused  on  future  or  subsequent  operations  and  deployed 
in  the  rear.  The  ACE  and  the  RISTA  TOC  provided  a  mix  of  collection  and  analysis.  The  CIC 
managed  information  to  support  the  commander.  Further,  a  mock-up  of  the  Army  Airborne 
Command  and  Control  System  (A2C2S)  provided  die  MSF  commander  a  platform  to  facilitate 
moving  aroimd  the  battlespace. 

TRADOC’s  concept  of  the  CIC  was  defined  by  the  BCBL  as  "the  means  to  meet  the  information 
needs  of  the  commander,  staff,  and  subordinate  units.  As  such,  the  CIC  was  used  to  gather, 
integrate,  and  synthesize  informaticxi  and/or  information  products  into  a  focused,  division-level 
database  for  the  commander  and  the  tactical  operations  centers."  The  CIC  fixused  informatioii 
searches  to  support  information  requirements,  based  on  Commander’s  Critical  Information 
Requirements  (CCIR),  and  processed  information  into  an  integrated,  coherent  product  called  the 
Relevant  Common  Picture  ^CP).  The  amcept  included  requirements  for  the  CIC  to  meet  specific 
requests  for  information  not  contained  in  the  RCP  that  the  commander  and  staff  may  have  required 
or  may  later  require. 

An  Assistant  Chief  of  Staff  for  the  CIC  provided  direct  supervision  of  the  seven  man  CIC 
<^eration  that  included  (1)  Fusion/OPS,  (2)  Maneuver,  (3)  ]&Lgineer,  (4)  Intelhgence,  (5)  Air 
Defense,  (6)  Combat  Service  Support,  and  (7)  Field  Artillery.  A  staff  officer  manned  each  of  the 
separate  MCS/P  work  stations  widiin  the  CIC  and  represented  each  of  the  battlefield  operating 
systems  (BOS).  A  Fusion  and  OPS  workstaticm  (Command  and  Control  fimctional  area)  served  as 
the  integraticm  statical  for  information  provided  by  the  other  members  of  the  CIC.  In  addition  to  the 
woikstations,  the  CIC  had  a  large  screen  display  (60  inch  monitor)  cormectedtothe  Fusion/OPS 
Workstation. 

The  fimctional  workstations  used  information  transmitted  from  BOS  workstations  throughout  tiie 
MSF  (e.g.,  MCS/P,  TEM-E/OPS,  AFATDS)  depending  on  the  functions  performed.  Additicmally, 
the  CIC  monitored  a  “brick”  radio  tuned  to  the  MSF  operatitms  and  inteUigoice  (O  &  I)  radio  net. 
This  voice  network  provided  the  c^portunity  for  noting  updates  of  the  current  operations.  This 
“stovepipe”  information  flow  into  the  CIC  from  the  appropriate  areas  or  subordinate  units  provided 
flie  functional  workstaticm  the  capability  to  manipulate  or  refine  data  to  provide  the  required 
information  Once  prepared,  the  functional  workstations  transmitted  the  information  to 
Fusion/OPS  whidi  condoised,  verified,  and  distributed  the  information  to  lower  echelons. 

3  Performance-Based  Metrics  Methodology 

With  Task  Force  XXI  and  the  Army  After  Next  initiatives,  the  Army  has  initiated  a  canqiaign  to 
evaluate  Advanced  Warfighting  Eiqieriments  ftiat  will  leverage  superior  technology  to  build  the 
Army  of  tomorrow.  The  central  and  essential  feature  of  this  Army  will  be  its  ability  to  exploit 
information,  \diich  will  lead  to  quick  and  decisive  victory.  Soldiers  will  be  the  most  inqiortairt 
element  of  Force  XXI,  for  it  is  thjou^  quality  soldiers  that  tire  full  power  of  technology  will  be 
realized. 
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ARL  assisted  the  TRAC  in  assessing  soldier  information  system  interfece  and  staff  coordination 
and  performance  using  MCS/P.  The  administration  of  anchor  scale  surveys,  direct  observation, 
and  video  tape  recording  of  MCS/P  during  SIMEX I  and  PW  96  provided  the  information  base  for 
this  assessment.  This  research  used  psychometric  principles,  staff  coordination  behavior  based 
mefliodology  (Leedom  &  Simon,  1995)  and  Human  System  Interfece  research  (Molich  &  Nielsen, 
1990)  to  develop  the  anchor  scales  and  standardized  task  performance  metrics  to  evaluate 
integrated  staff  and  soldier  information  system  performance  on  the  MCS/P  BOS. 

The  Universal  Joint  Task  List  (UJTL)  was  used  to  identify  essential  tasks  that  a  combat 
commander  is  required  to  perform  in  exercising  command  and  control.  This  list  serves  as  an 
interoperability  tool  to  help  commanders  construct  their  joint  mission  essential  task  list.  It  is  a 
comprehensive  hierarchical  hsting  of  the  tasks  that  can  be  performed  by  a  joint  military  force. 

UJTL  is  organized  into  four  separate  parts  by  the  level  of  war:  Strategic  level  -  National  military 
tasks,  Strategic  level  -  Hieater  tasks.  Operational  level,  and  Tactical  level  tasks.  Each  task  in  the 
UJTL  is  individually  indexed  to  reflect  its  placement  in  the  structure.  Thus,  the  UJTL  provides  a 
standard  reference  S5^em  for  users  to  address  and  report  requiremaits,  capabilities,  or  issues  and 
as  such  fomied  the  Command  Staff  task  baseline  around  vdiich  ARL  developed  its  standardized 
soldier  perfermance  metrics  research  efforts. 

3.1  Behavior-Anchored  MCS/P  Function  Support  Assessment 

Two  behavior-anchored  scale  assessment  instruments  were  developed  and  administered  to  the  MSF 
during  SIMEX  I  and  PW  96.  Utilizmg  the  decisicm  level  UJTL  tasks  as  a  foimdation,  the  first 
instrument  fecused  on  the  interrelationship  between  the  Division  Staff  functions  or  processes  and 
MCS/P.  ARLs’  metrics  development  methodology  established  a  crosswalk  of  FM  101-5 
Dehberate  Decision  Making  Processes(PDMP)  with  the  MCS/P  software  modules  believed  to 
support  critical  (identified)  command,  staff  task  execution.  FM  101-5  states  that  a  staff  supports 
the  science  of  control  in  four  primary  ways:  (1)  gathers  and  provide  information  to  the  commander, 
(2)  makes  estimates  of  the  set  of  actions  required,  (3)  prepares  plans  and  orders,  and  (4)  measures 
organization  behavior.  To  perform  this  type  of  support,  the  Division  Staff  and  commanders  use 
the  DDMP  wfiich  requires  staff  coordination  between  and  within  echelons,  ft  can  be  assumed  that 
the  MCS/P  battle  command  system  capabilities  were  developed  to  support  these  processes.  Figure 
3  depicts  the  ARL  crosswalk  of  MCS/P  beta  software  capabilities  with  the  key  staff  tasks  of  the 
DDMP. 

Given  the  Crosswalk  matrix,  ARL-HRED  developed  six  (6)  key  behavior-based  staff  coordination 
evaluation  dimensions  to  assess  the  ability  of  the  digitized  maneuver  command  control  S5^em 
MCS/P  to  support  the  DDMP  and  staff  coordination.  These  six  dimensions  are  listed  in  Table  1 . 
Each  dimension,  is  defined  in  terms  of  sub-dimensions  and  specific,  operationally  relevant,  staff 
related  behavior.  The  behavior  anchor  scale  format  standardized  the  perception  of  the  MSF  as  to 
vfeat  each  dimensicn  was  trying  to  assess  and  experimentally  reduced  die  response  variability. 
Definiticxis  and  descriptioas  for  MCS/P  supporting  the  type  of  behavior  for  greatly  fecilitated, 
borderline,  and  greatly  hindered  performance  were  developed  to  include  example  task.  The  written 
descriptions  of  the  levels  of  performance  for  each  sub-dimaision  were  assigned  values  of  5  through 
1  (one  reflecting  that  it  hindered  performance,  three  being  the  same  as  manual  methods,  and  five 
that  it  fecilitated  performance)  to  serve  as  anchors  for  fee  five-point  Likert  type  scale.  Guidelines 
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FIGURE  3:  Crosswalk  of  MCS/P  Capabilities  vs.  Key  Deliberate  Decision  Making  Processes. 

OB:  Order  of  Battle;  OPORD:  Operation  Order;  COAST:  Course  of  Action  Situational  Template 

were  prepared  to  assist  the  MSF  in  assessing  how  well  die  staff  performed.  Hie  MSF  assessed  die 
key  Maneuver  Staff  functions  after  SIMEX  1,  which  was  used  as  a  training  exercise,  and  PW  96, 
die  main  combat  exercise. 

3.2  Task-Centered  Usability  Assessment 

The  second  standardized  instrument  develqied  by  ARL-HRED,  focused  cm  die  usability  of 
die  individual  soldier-MCS/P  system  interfoce.  Certain  system  design  characteristics  have  been 
defined  in  the  literature  that  reflect  platforms  widi  good  interfoce  usability  (Nielsoi  &  Molich, 
1990).  TTiese  design  diaracteristics  were  used  by  ARL-HRED  to  focus  on  rating  12  stafftasks  on 
sixteen  interfoce  usability  and  graphics  issues  as  shown  in  Table  2.  These  characteristics  include 
whether  die  cxinqiuter  system  (xmtains  sinyile  and  natural  dialc^e,  reflects  doctrine  or  "qpeaks  the 
user  language,”  minimizes  user  memory  load,  remains  cemsistent  between  differCTt  mexiules, 
provides  feedback,  provides  clearly  marked  exits,  provides  dioitcuts,  and  prevents  errors. 

The  usability  fector  has  a  direct  intact  on  the  tactical  decision-making  process.  Malfunctions  in 
system  usability  lead  to  underlying  error  patterns  such  as  attraidon  fetigue,  excessive  mental 
workload,  inappropriate  priorities,  delays  in  tenqio,  and  ultimately,  communication  feilures.  These 
error  problems  can  lead  to  more  serious  tactical  feilures  such  as  inadequate  batde  plans. 
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Table  1:  Behavior  Evaluation  Dimmsions 


Evaluation  Dimension 

Sub  Dimension 

Behavior  Anchor  Focus 

Mission  Planning  and  Refinement 

mmmm 

Automated  information  being  readily  available  and 
assessable  to  facilitate  horizcaital  and  parallel  plaiming 

Missicn  Planning  and  Refinement 

Impact  on  COA  Development 

Coordinated  input  into  the  developing  COAs  of  key 
staff  pCTSpectives 

Mission  Planning  and  Refinement 

Inpact  on  COA  Analysis 

Staff  simultaneously  analyzing  altemative  COAs  by 
maintaining  a  shared  common  understanding  of 
mission  intent,  joint  identificaticm  of  COA  problans, 
branch  cOTrtingencies,  etc. 

fofonnation  Assimilation 

Assimilation  of  digitized 
messages 

Finding,  reviewing,  and  assimilating  infonnatioo  from 
text  messages  to  obtain  CCIR 

Infonnation  AssimilaticMi 

Assimilation  of  digitized 
graphics 

Finding,  reviewing,  and  assimilating  information  from 
graphical  display  to  obtain  CCIR 

Generation  of  Messages  and  Reports 

Enhance  ability  to  prepare  ordas 
and  reports 

Si^portingthe  staffs'  ability  to  prepare  and  send 
desired  messages  and  reports 

Situati<Hial  Awareness 

Realtime  asses  to  data 
sources  at  all  echelcais 
for  effective  CCIR-based 
push/puUs? 

Staff  maintaining  a  shared,  real-time  awareness  of  the 
battlespace  which  is  formulated  into  a  coordinated 

RCP.  Selective  filtering  and  assimilation  of  situation- 
based  information. 

Situational  Awareness 

Facilitate  effective 
monitoring  of  oitical 
events  and  receipt  of 
critical  messages 

How  digitization  assisted  the  battle  staff  in  keq)mg 
each  elanent  aware  and  informed  of  critical  events  and 
factors. 

The  Relevant  Common  Picture 

Facilitate  develcpment 
and  maintenance  of  a 
coordinated  relevant 
common  picture? 

The  fcamulation  of  the  RCP  graphic  visualizaticHis  and 
initial  information  diss^nination.  Staff  automatic 
situation  infonnation  mcmitoring.  Automated 
graphic  aids  fc«- timely  RCP  and  follow-on 
distribution? 

The  Relevant  Commcai  Picture 

Facilitate  distribution  of 
the  relevant  common 
picture  i^dates  to  all 
battle  command  elements? 

Timely  distribution  of  the  RCP  gr^hic  visualizations 
and  information  updates.  Automated  situation 
mcmitoring.  Automated  gr^hic  aids  for  timely  RCP 
i^dating  and  foUow-cm  distribution? 

Woridoad  Distributicm 

i^^ropriately  distribute  mission 
tasks  between  staff 

NGssion  task  prioritizatiai  and  workload  distribution. 

inadequate  reporting,  fratricide,  lack  of  coordination,  and  inadequate  situational  awareness. 
Understanding  the  individual  soldier-MCS/P  system  interface  also  signals  the  movement  to  correct 
lapses  and  underlying  error  problems  with  the  system  interfece,  and  in  turn  prevent  major  system 
feilures  and  significantly  increase  the  chance  of  success  in  combat  operations  on  the  battlefield.  In 
application,  a  heuristic  evaluatim  was  dcxie  by  having  the  MSF  rate  all  of  frie  tasks  for  eadi  issue 
cm  a  scale  of  one  to  five  (ime  being  the  worst  and  five  the  best)  after  having  used  MCS/P  during 
SIMEX I  and  the  actual  Prairie  Warrior  exercise.  Heuristic  evaluaticm,  as  described  by  Nielsen 
and  Molidi  1990,  is  a  method  of  usability  analysis  w^ere  a  number  of  users  are  presented  with  an 
inter&ce  design  and  fiien  expected  to  comment  cm  it.  As  in  the  first  instrum^  assessment 
tool,  the  soldiers'  perc^^ticm  of  the  usability  issues  as  related  to  the  Maneuver  Staff  tasks  was 
standardized  using  anchor  scale  m^odology. 
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3.3  Performance-Based  Metrics  Participants 


TTie  informatitm  interfece  performance  metrics  were  administered  to  the  entire  MSF  during  SIMEX 
I  and  PW  96.  This  force  was  primarily  conqjosed  of  students  (88  Majors)  from  the  Command  and 
General  Staff  Officers  Course  A308  (CGSOC),  Battle  Command  Elective  (January-May  1996). 
This  course  provided  the  training  for  commanders  and  staff  officers  which  included  classroom 

Table  2:  Usability  Index  Issues  and  Maneuver  StaffTasks 


1  Maneuver  Staff  Task 

Usability  Issues 

Displa5dng  &  Manipulating  Maps 

Plotting  &  Manipulating  Units 

Building  Overlays  Templates 

Creating,  Editing  Updating  Data  Bases 

Buildins  Friendly  &  Enemy  Order  of  Battle 

Tenq)0 

Utility 

Flexibility  in  use 

Prevent  Fatigue 

Mirror  Doctrine 

Provide  process  Short  Cuts 

Preparing  Task  Organizaticms 

Consistency  between  Modules 

Computing  Force  Ratios 

Minimize  demand  on  Memory 

Preparing  Briefings 

Provide  Feedback 

Preparing  Operation  Orders 

Good  Error  Recovery 

Building  &  Displaying  Alarms 

Process  Shortcuts 

Sending  &  Receiying  Inf. 

Intuitiveness 

instructi<m  in  MSF  concerts  and  tactics,  techniques,  and  procedures  (TTP)  and  hands-on  training 
for  MCS/P.  The  class  also  included  two  simulation  exercises  (SIMEXes)  and  the  culminating 
CGSOC  exercise  PW  96. 

4  Results  &  Discussion 

4.1  MSF  Responses 

ARL  administered  the  instruments  to  the  aitire  MSF  immediately  after  SIMEX  1  and  the  PW  96 
exercise.  A  total  of  84  (  95  %)  surveys  were  coirq)leted  and  returned  after  SIMEX  I  and  44 
(54  %)  after  PW  96.  Considering  that  the  end  of  the  PW  96  exercise  coincided  with  graduaticai 
ceremcHiies  and  the  students  being  assigned  to  dher  duty  stations  and  packing  to  leave  Fort 
Leavenworth,  Kansas,  a  decline  in  the  responses  was  not  surprising.  In  addition,  8  students  who 
were  assigned  to  the  MSF  Commanders  staff,  did  not  use  MCS/P  during  the  PW  96  exercise  and 
submitted  blank  questionnaires  after  PW  96.  Since  they  did  not  use  MCS/P  since  SIMEX  I,  ftiey 
stated  that  their  responses  after  PW  96  would  be  exactly  the  same  as  their  respcxrse  after  the 
simulation  exercise. 

To  address  response  bias,  the  SIMEX  I  response  distributicm  offtie  44  students  that  respcmdedto 
both  SIMEX  1  and  PW  96  was  conqjared  to  the  response  distributicms  of  the  40  students  that 
responded  after  SIMEX  I  but  did  not  respond  after  PW  96.  A  non-parametric  Chi-Square  statistic 
was  used  to  if  the  frequency  response  across  die  rating  scale  was  statistically  differait 

between  the  two  response  grovqrs  ft)r  each  question.  In  93  %  of  all  questions,  no  significant 
difference  bdween  the  groups  could  be  determined  at  the  .05  significance  level.  By  chance  alcme. 
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one  would  expect  between  two  and  three  questions  to  be  significant  'wdien  testing  at  the  .05  level. 
Thus,  there  is  no  evidence  of  a  significant  response  bias. 


4.2  Database  Building,  Manipulating,  and  Editing 

The  MCS/P  was  extremely  flexible,  useful,  and  reduced  the  time  it  took  for  the  user  to  manipulate 
databases.  More  than  70%  of  the  MSF  respondents  (Chi  Square  =  22.4,  p  <  .05)  liked  the  utility 
and  speed  of  the  MCS/P  database  capabilities,  indicated  that  it  mirrored  doctrine,  and  was  not  as 
fetiguing  as  standard  methods.  The  database  system  allowed  the  user  to  construct  a  list  of  fiiendly 
and  enemy  databases  used  most  fi'equently  by  each  stafF  element  (G2,  G3,  engineer,  etc.). 
However,  only  cme  record  in  a  database  could  be  located  at  a  time.  This  serial  editing  and  retrieval 
of  information  violated  cognitive  congruency  between  soldiers’  expectations  and  MCS/P  which 
caused  the  DDMP  to  break  down  and  force  the  decision  makers  to  invoke  oflier  cognitive  tactics 
such  as  decision  forestalling  and  assumption-based  reasoning.  Cognitive  Congruency  relates  to  the 
degree  to  which  the  information  management  and  display  paradigms  are  matched  witia  the  training 
and  e^erience  of  the  human  operator.  Because  a  database  could  contain  more  than  100  records, 
this  record-locating  procedure  was  time-consuming  and  resulted  in  an  increased  user  workload  and 
demand  on  memory.  The  majority  of  the  respondents  (  56  %)  felt  that  the  error  recovery  was  poor 
with  only  10  %  e:!q)ressing  an  opinion  that  it  was  good  (Chi  Square  =  19.3,  p<  .05).  Regarding 
editing,  the  editor  window  was  extremely  cluttered  and  its  use  was  neither  intuitive  nor  consistent. 
One  example  of  this  confusion  was  the  “edit  records”  procedure  vhich  used  ADD,  MODIFY,  and 
DELETE  commands.  Another  example  was  the  “retrieve  records”  procedure  vdiich  used  the 
FETCH  or  QUERY  command.  The  editor  window  should  be  sinqilified  and  the  capability  to 
handle  more  than  one  record  at  a  time  should  be  included  in  future  MCS/P  development. 

4.3  Creating  Situation  Awareness 

The  usefulness  of  the  MCS/P  in  creating  situation  awareness  varied  across  the  MSF 
echelons  and  was  a  function  of  information  timeliness,  accuracy,  and  detail.  At  the  division 
level,  the  size  and  resolution  of  the  13-inch  display  prevented  the  commander  fi'om  obtaining 
a  detailed  view  of  the  entire  MSF  battlespace  to  visualize  unit  movement  in  a  large  area  (275 
k  X  275  k).  When  this  wide  area  battlespace  was  attenq)ted  to  be  viewed  on  the  13-inch 
computer  monitor,  the  multitude  of  symbols  and  icons  involved  presented  a  cluttered 
display.  This  area  of  view  limitation  and  display  clutter  ( lack  of  Cognitive  Congrumcy)  did 
not  fit  the  experienced  based  mental  model  of  the  commander  and  limited  his  insight.  To  deal 
with  is  limitaticm,  division  commanders  rehed  on  the  map  sheets  to  do  mission  planning  and 
analysis.  Fewer  than  20  %  of  the  respondents  (Chi  Square  =  60.1  p<  .01)  felt  that 
digitization  fecilitated  mission  analysis  because  of  this  limitati(»i  of  battlefield  visualization 
and  stated  in  their  comments  that  they  rehed  cm  the  map  sheets  to  conduct  mission  planning 

The  MCS/P,  however,  was  a  good  tool  for  displaying  information  and  allowing  the  staff  elements 
(soldier),  at  the  smaller  brigade  area  of  interest,  to  integrate  this  informaticm  to  create  situational 
awareness  of  their  battlespace  and  conduct  Course  of  Action  (CO  A)  develq>mait.  More  than  57% 
of  the  MSF  rated  the  MCS/P  as  fecilitating  or  greatly  fecihtating  and  97  %  felt  that  it  supported 
the  staffs'  ability  to  monitor  critical  events  and  receive  critical  messages  in  a  timely  feshion  as  well 
as  distribute  the  Relevant  Commcm  Picture  (REP)  and  ke^  all  elements  aware  of  critical 
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change  (Chi  Square  =  43.8,;?  <  .01).  The  MSF  user  displayed  maps,  set  features  of  the 
maps,  plotted  military  units,  and  manipulated  unit  icons  easily.  The  users  were  also  able  to  easily 
zoom  into  an  area  of  interest  using  either  a  raster  graphics  or  vector  map.  The  location  tracker 
assisted  the  commander  in  locating  positions  on  a  map  and  cross-vahdate  positions  received  from 
ASAS.  The  majority  of  the  MSF  participants  (69  %)  reported  that  die  map  tools  mirror  doctrine, 
were  ccmsistent  (80  %)  and  had  a  moderate  to  low  demand  on  memory  (77  %).  The  armor  brigade 
commander  used  the  field-of-view  tool  to  tactically  position  his  umts.  Map  features  were  easily 
displayed  and  the  distance  tracker  tool  was  observed  being  used  by  the  operations  officer  and 
engineer.  TTie  ability  to  display  a  unit’s  strength  was  a  valuable  tool  in  developing  courses  of 
acticm  (CO As).  The  armor  brigade  commander  in  the  MSF  frequently  used  this  tool  to  detennine 
the  effectiveness  of  the  course  of  acticm  that  he  had  previously  chosen. 

At  STARTEX  of  PW  96,  diere  was  a  problem  with  data  transfer  as  the  MSF  staff  could  not 
retrieve  a  database  and  transfer  informaticm  automatically.  To  conqjensate  for  this  deficiency, 
ten:q)lates  were  created  manually  by  drawing  phase  lines  and  units.  This  delayed  the  development 
of  the  RCP  and  diminished  its  accuracy  regarding  enemy  positicms.  During  SIMEX  I,  H,  and  PW 
96,  data  flows  to  the  CIC  improved,  the  time  it  took  to  develop  the  current  RCP  decreased  by  30 
TTiinutRg  on  the  average,  but  enemy  loc^on  data  were  still  one  hour  old  and  therefisre  not  timely. 

In  conclusion,  the  MCS/P  has  numerous  tools  to  create  flie  RCP,  but  when  the  information  is 
neither  timely  nor  accurate,  relevant  situaticmal  awareness  will  not  be  ac^eved.  This  deficiency 
degrades  performance,  especially  during  close  battle  operations. 

4.4  Templates  and  Drawing  Tools 

The  overlay  building  capability  of  MCS/P  was  flie  most  utilized  tool  in  MCS/P.  The  entire  MSF 
was  observed  using  and  developing  tenqilates.  Overlays  provided  what  many  staff  members 
considered  to  be  “snap  shots”  of  the  battlefield,  v/hichi  effectively  flowed  amcmg  the  MSF.  After 
PW  96,  almost  70%  of  the  MSF  rated  this  capability  as  better  or  much  better  contqiared  to 
standard  manual  methods  (paper  maps,  grease  pencils)  (Chi  Square  =  15.1,  p<  .01).  The  MCS/P 
overlays  frcilitated  planning  and  CO  A  development  (70%  of  the  MSF  responded  fliat  it  supported 
or  fecilitated  COA  developmaxt)  by  providing  brigade  stafife  ftie  ability  to  “call  up”  adjacent  units’ 
overlays,  as  well  as  operation  orders  using  Microsoft  Word  (WINDD).  This  overlay  retrieval 
capability  allowed  the  brigade  staffe  to  review  the  MSF  division  branch  plans  as  they  were 
performing  their  missions  and  allowed  the  brigades  to  quickly  develop  FRAGOs.  This  tcxil  was 
extremely  flexible.  There  are  various  layers  in  wdiich  different  items  on  a  map  are  saved  so  they 
could  be  tailored  for  different  echelons  and  fimcticms.  For  exanqile,  flie  default  layer  was  used  to 
pressit  shapes,  lines,  and  text.  The  command  layer  craitained  units  from  a  database,  map  features, 
and  obstacles.  The  grid  layer  cxintained  grid  information.  However,  the  drawing  tool  was  limited 
and  not  user-fiiaidly.  The  MCS/P  during  PW  96  allowed  the  user  to  draw  line  segmaits, 
polygcms,  ellipses,  rectangles,  and  circles.  The  MSF  spent  inordinate  amounts  oftime  drawing 
arrows  and  phase  lines  and  positioning  icxxis  cm  the  map.  By  the  end  of  PW  96,  the  MSF  was 
much  more  proficient  in  develc^ing  their  overlays.  This  is  reflective  in  a  significant  shift  in  the 
ARL’s  survey  rating  distribution  of  the  MCS/P  to  fecilitate  mcmitoring  of  critical  events,  receipt  of 
critical  messages,  and  develop,  update,  and  chstribute  the  RCP  between  SIMEX  I  and  PW  96.(Chi- 
Square  =  10.84,  /K  0.02).  This  drawing  tool  needs  to  be  upgraded  by  automating  the  placement  of 
phase  lines,  arrows  and  other  icons  and  figures.  Building  a  template  was  conqilic^ated,  involving  a 
multitude  of  windows  and  maius.  For  example,  it  took  sev^  commands  to  plot  one  symbol  from 
an  existing  palette.  A  sinqile  drawing  and  symbol  maiu  needs  to  be  developed.  The  soldier  could 


11 


click  on  a  desired  symbol  and  then  insert  it  at  the  position  of  his  or  her  cursor  as  can  be  done  using 
Microsoft  Word  {symbol  211  \f  "Symbol"  \s  11}. 


4.5  Soldier-Computer  Usability 

The  primary  soldier  MCS/P  (Beta)  interfece  deficiencies  concerned  stability,  error  recovery, 
simplicity,  user  feedback,  and  process  shortcuts.  The  soldier  had  poor  feedback  (lack  of 
knowledge  of  results)  to  know  that  the  MCS/P  was  processing  a  fimction,  such  as  retrieving  a  map, 
or  when  a  system  error  occurs.  This  resulted  in  the  soldier  trying  altemate  sequences  of  button 
pressing  to  achieve  the  desired  function.  This  resulted  in  further  deterioration  of  the  computer 
system,  the  system  locking  up,  and  the  loss  of  previous  work.  Over  a  six  hour  period  during  PW 
96,  3  instances  of  MCS/Ps  locking  up  were  observed.  Future  systems  should  provide  a  visual 
icon  that  shows  the  S5^m  is  in  the  midst  of  processing,  understandable  error  messages,  and 
automatic  system  backup. 

The  need  for  improved  system  error  recovery  was  reflected  in  the  MSF's  response  to  the  Task 
Centered  Usability  Index  regarding  error  recovery.  The  majority  of  the  MSF  that  respcmded  felt 
thattheerror  recovery  capability  of  MCS/P  needed  inqjrovemait.  Less  than  11%  of  the  responses 
rated  the  MCS/P  as  having  good  or  excellent  error  recovery  capabilities  (Chi  Square  =  10.7,  p  < 
.02). 

Observation  revealed  that  there  was  a  need  for  consist^cy  between  various  system  functions.  For 
example,  the  user  could  select  QUIT,  END,  or  EXIT  to  and  different  functions.  There  were  too 
many  menus  and  steps  and  no  well-defined  shortcuts  to  perform  or  quit  a  function. 

The  survey  also  revealed  that,  in  gaieral,  the  system  increased  the  tenqjo  of  activity,  was  flexible, 
mirrored  doctrine,  and  was  not  fetiguing  to  use  more  flian  ten  hours  wWb.  high  and  low  periods  of 
battlefield  qperations  for  various  maneuver  command  and  control  tasks  while  in  a  stationary 
environment.  Almost  60  %  of  the  MSF  felt  that  digitization  (MCS/P)  reduced  the  time  it  took  for 
the  battle  staff  to  complete  its  tasks  (Chi  Square  =  72.0,  p<.01).  Across  all  modules  and  staff 
tasks,  the  opinion  of  the  MSF  was  that  the  system  mirrored  doctrine.  Over  60%  of  the  responses 
rated  the  MCS/P  software,  automated  processes,  and  sequences  as  acciuately  or  very  accurately 
mirrored  doctrine.  Sevaity  nine  percent  of  the  MSFs'  responses  reflected  that  the  fatique  level 
e3q)erienced  by  the  MSF  was  the  same  or  less  as  performing  the  tasks  manually  (Chi  Square  = 

20.9,  p<  0.1). 

4.6  Sending  and  Receiving  Information 

The  e-mail  function  in  MCS/P  was  extremely  useful  in  maneuver  command  and  control  and  the 
‘^transfer  tool”  was  an  outstanding  software  apphcation  within  e-mail.  This  function  allowed  the 
soldier  to  said  and  receive  messages,  overlays,  and  reports  quickly  to  and  fi'om  selected  locations 
on  the  battlefield.  Seventy  percent  of  the  MSF  felt  that  the  MCS/P  supported  or  enhanced  the 
ability  of  the  staff  to  distribute  the  RCP  and  keep  all  MSF  elements  aware  of  critical  situaticmal 
change  (Chi  Square  =  15.\, p<.01J. 

The  transfer  tool  window  consisted  of  three  columns  qilit  into  iqiper  and  lower  sections.  The 
upper-left  column  listed  overlays,  the  upper-middle  hsted  databases,  and  the  upper-ri^t  hsted  site 
addresses  for  potential  recipients.  As  the  user  selected  the  overlays  or  databases  to  be  sent  and  the 
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intesided  recipients,  the  overlays  and  databases  drai  appeared  in  the  bottom  sections  of  their 
correspcmding  columns.  This  organized  visual  feedback  allowed  easy  and  quick  selection  and 
transfer  of  information. 

Hie  ennail  applicaticm  also  allowed  the  user  to  transfer  overlays  between  and  within  echelons.  To 
retrieve  an  overlay  from  another  madiine,  the  user  had  to  know  the  other  machine  s  address.  The 
e-mail  system  was  also  extremely  flexible  in  editing  destination  addresses,  and  receiving  brigade 
and  below  command  and  control  reports.  Hus  tool  ^ould  r^nain  as  a  standard  feature  of  MCS/P. 

4.7  Collaborative  Staff  Information  System  Interface 

The  MCS/P  digital  informatics  technology  offered  significant  inprovemaits  in  MSF  collaborative 
stafFperfoimance  over  currait  manual,  paper-based,  voice-commimicated  command  staff  products. 
The  MSF  <?raff  organized  around  the  concqitual  staff  elemait  (CIC),  used  the  MCS/P  system 
extensively  for  distributicHi  and  maintenance  of  situaticmal  awanaiess  throu^  continuous 
^neratics  and  distribution  of  RCP  graphics  and  "Post-it  Nc^es." 

During  PW  96,  the  MCS/P  aided  the  staffs  collaborative  planning  and  flieir  executksi  tasks 
greatly  inqiroved  fi^om  fee  initial  attenqrts  of  SIMEX  I.  However,  scrnie  performance  feortcomings 
remained  unfeanged  through  out,  fee  PW  96  exercise.  Hus  was  mainly  ficxn  a  combmatics  of 
ineffective  staff  training  oa  MCS/P,  as  well  as  MCS/P  software  deficiaicies  .  The  training 
problem  included  (1)  fee  lack  of  command  staff /CIC  staff  training  as  a  collaborative  team  and  (2) 
fee  lack  of  effective  individual  training  on  MCS/P  functions,  which  resulted  in  fee  staff  mmibers 
spending  too  much  time  trying  to  decide  how  to  wcecute  a  MCS/P  function  rather  ftian  spmding 
time  performing  critical  staff  fimetiws  wife  MCS/P.  The  lack  of  an  effective  MCS/P  user  s 
manual  greatly  contributed  to  fee  training  feortfeU.  The  overall  MSF  staff  skill  levels  on  MCS/P 
software  functi<mlity  continued  to  improve  throu^out  PW  96,  but  as  a  unit,  they  never  reached 
fully  effective  levels.  Some  individual  users,  possessing  superior  computer  skills  and  espertise 
wife  fee  various  MCS/P  automated  tools,  did  emerge  dunng  PW  96  to  demonstrate  fee  promise  of 
digitizaticMi  to  greatly  iitprove  warfighter  rffectivaiess.  For  exanple,  in  fee  Armor  brigade,  fee 
engineer  used  fee  3D  terrain  elevatiai  tools,  line  of  si^  and  field  of  view  to  effectively  optimize 
his  units  positions  in  relaticsi  to  fee  OPFOR. 

For  Imy  giaff  planning  and  coordination  portions  of  fee  TDMP  (Mission  Analysis,  COA 
Develqimmt  and  Aniysis,  and  Wargaming)  as  well  as  some  key  staff  tasks  conducted  during 
battle  execution  phases  (Engineer  operatims.  Tactical  Fire  Direction,  and  ADA),  fee  MCS/P  does 
not  totally  sipport  ‘'two-way”  interactive  parallel  planning  betweai  higher  and  adjacent  units  as 
well  as  it  tni£^  The  Critical  software  deficiencies  in  aiding  collaborative  staff  performance 
included  (1)  fee  lack  of  an  “evart-diivai  Synch  Matrix”  tool,  (2)  fee  lack  of  automated  OPORD 
and  r^ort  gmetatiem  tools  (fee  Windows  Desktop  Display  (WINDD)  and  netwofeed  file-server 
served  this  purpose),  and  (3)  fee  lack  of  an  effective  map  display  (i.e.,  poor  resolutiim,  poorly  read 
map  terrain  and  awkward  scaling  tools)  which  resulted  in  the  commanders  using  paper  m^s  at  fee 
divisiem  level  to  do  miRCiinn  planning. .  The  commander’s  mental  expectation  firran  his  or  her  past 
eiqieri^ce  a  syndironizaticai  matrix  was  feat  it  is  an  event  drivai  process.  Instead,  fee 
MCS/P  required  fee  commander  to  syndutmize  his  plan  according  to  an  arbitrary  time  sdiedule. 
This  lack  of  cognitive  ctmgruCTcy  resulted  in  the  battlestafif  using  Windows  Deskfc^  Display 
software  to  manually  develop  their  Syndirooizatim  Matrix.  Over  65%  of  fee  MSF  responses  rated 
fee  syndironization  matrix  as  being  slower,  not  intuitive,  and  useful  as  p^er-based,  voice- 
ccmimunicated  methods  (Chi  Square  =  12.01, p<.02). 
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For  collaborative  stafiF planning,  tbe  MCS/P  RCP  overlays  provided  vdiat  many  MSF  staff 
members  considered  as  simple  Division  level  sketches,  which  very  effectively  flowed  among  the 
MSF.  The  RCP  assisted  die  various  MSF  staff  elements  in  sharing  a  “common  picture”  of  the 
divisicm  batdespace.  However,  die  RCP  was  only  as  timely  (and,  hence,  accurate  or  relevant)  as 
the  “age”  of  the  data  sources  used  and  the  effectivaiess  of  the  TTPs  followed  for  its  development. 
Thus,  the  MSF  staff  generally  ccmsidered  the  RCP  to  be  of  limited  value  because  of  the  time  lag 
inheroit  in  its  production.  Finally,  because  of  the  MCS/P’s  small  screen  and  its  lack  of  resolution, 
collaborative  planning  within  a  cell  was  not  easily  acconplished  because  all  parties  could  not  view 
the  MCS/P  diplay  with  the  needed  detail.  Therefore,  much  collaborative  planning  at  various  MSF 
staff  elements  still  centered  around  the  use  of  large  paper  maps.  The  majority  of  the  MSF  (60%) 
rated  the  MCS/P  as  offering  only  borderline  support  in  the  conduct  of  mission  planning  and 
analysis  (  Chi  Square  =  60.1,  p<.01). 

MCS/P  capabilities  proved  very  effective  if  time  were  constrained  (such  as  in  a  time-constrained 
fragmaitary  order  (FRAGO)).  Automated  graphic  capabilities  such  as  plotting  oiemy  locations 
frcMn  databases,  establishing  imit  Order  of  Battle,  tracking  high  priority  target  artillery  groupings, 
identifying  key  road  networks,  or  fectoring  in  visibility  or  elevation  data  became  very  effective 
tools  for  collaborative  interacticHi  between  time  stressed  planning  cells.  On  the  other  hand,  because 
of  tibe  functional  conplexity  of  using  MCS/P  for  quick  time  and  pace  analysis  in  plaiuiing 
immediate  actions  to  exploit  windows  of  <pportunity  or  eliminate  unse^  threats,  some  BDE  staff 
elements  found  MCS/P  less  ffian  optimum  to  execute  these  fost  paced  missicm  coordinaticai  and 
execution  functions.  Eighty  two  percent  of  the  MSF  that  reponded  felt  that  the  MCS/P  did  not 
support  the  staff  or  offered  cmly  borderline  sipport  in  simultaneously  analyzing  courses  of  actions 
(COAs)  (Chi  Square  =  15.9, /K  .01).  This  was  epeciaUy  noticed  during  close  battle.  Instead,  for 
exchange  of  key  time-s^itive  close  fight  information,  voice,  size,  activity,  loc^on,  unit,  time,  and 
ecpipment  (SALUTE)  rports  were  the  preferred  means  cxf  communication  between  higher  and 
adjacait  echelons.  During  foe  slower  paced  planning  phases  of  foe  TDMP,  foe  MCS/P  "Post-it 
Notes"  capability  was  ccmsidered  an  excellent  tool  for  CCIR  information  exchange.  However,  foe 
majority  of  foe  MSF  staff  were  not  well  enough  trained  to  routinely  establish  “selective  filter 
alarms”  for  foe  MCS/P  to  automatically  screai  and  diplay  CCIR  orioited  messages  over  foe 
changing  phases  of  foe  battle.  During  foe  time  sensitive  collaborative  monitoring  of  foe  close  battle, 
many  staff  members  ccmsidered  foe  "Post-it  Notes"  process  too  slow  and  cumbersome  for  use. 
Additicmally,  foe  lach  of  effective  user-set  alarm  selectivity  caused  some  staff  members  to  be 
inundated  with  messages  so  they  sinply  disabled  foe  MCS/P  alarm  function,  hi  foe  case  of  foe 
“Air  Strike  Warning”,  one  ADA  staff  officer  indicated  that  processing  and  overlaying  foe 
information  cm  firimdly  units  took  so  many  key  and  mouse  manpulaticms  that  foe  resulting 
information  about  unit  vulnerability  was  generated  too  late  to  be  useful  for  warning  foreataied 
units.  Because  of  foe  complexity  and  inconsistency  of  foe  various  functicmality’s  residait  in 
MCS/P,  many  close  fi^  coordinaticm  efforts  between  various  staff  elements  (e.g.,  TOC-A  and 
Divarty)  were  dcme  by  voice  because  of  foe  slow  reponse  time  and  process-intensive  effort  to  get 
critical  information  fi’om  foe  MCS/P  system.  In  summary,  gjvoi  foese  MCS/P  shortcomings,  foe 
voice  mcxle  became  foe  communicaticm  chaimel  of  choice  for  time-critical  collaborative  exchanges 
for  many  MSF  staff  members. 
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5  Summary 

Based  on  the  Perforaiance  Based  Metrics,  the  MCS/P  was  somewhat  effective  in  creating 
ihe  MSF  staffs’  situational  awareness  and  in  portraying  and  communicating  a  timely  and 
accurate  relevant  commcMi  picture  (RCP).  The  MCS/P’s  performance  in  allowing  the  user 
to  build  and  transfer  databases  was  a  strong  point.  However,  there  were  some  shortcomings 
to  the  system  provided  to  the  MSF.  The  wargaming  tool  did  not  work.  The  system  was  not 
stable,  being  prone  to  crashes  for  much  of  the  SIMEXes  due  to  lack  of  feedback.  Editing 
and  management  tools  for  database  records  needed  improvement,  as  well  as  several  system 
usability  and  interfece  characteristics. 

While  individual  performance  of  MCS/P-aided  collaborative  planning  and  execution  tasks  greatly 
in5)roved  from  the  initial  attenqrts  in  SIMEX  I,  some  shortcomings  remained  unchanged 
throughout  the  experimaitaticai  due  mainly  because  of  a  lack  of  in-dqpth  experience  on  MCS/P  and 
MCS/P  software  deficiencies.  This  lack  of  e^qperience  resulted  in  the  staff  members  spending  large 
amounts  of  time  trying  to  determine  how  to  execute  an  MCS/P  fiinction  rather  than  spading  time 
performing  critical  staff  fimctions  with  MCS/P.  The  MCS/P  user’s  manual  was  ineffective  and 
this  greatly  contributed  to  the  training  shortfell.  The  overall  MSF  staff  skill  levels  on  MCS/P 
software  fimctionality  continued  to  improve  throu^out  PW  96,  but  some  individuals  never 
reached  fully  effective  levels.  Some  individual  users,  with  hi^er  d^ees  of  conq>uter  skills  and 
expertise  with  the  various  MCS/P  automated  tools,  did  CTierge  during  PW  96  to  demonstrate  that 
digitizaticm  has  the  potential  to  greatly  improve  the  soldiers'  warfigjrter  effectiveness. 

Because  of  the  lack  of  both  confidonce  in  MCS/P  and  e^)erience  of  the  various  functionalities 
resident  in  MCS/P,  many  close  figjit  coordinati<Bi  efforts  between  various  staff  elements  were  done 
by  voice  because  of  the  slow  response  time  and  process-intensive  effort  to  get  critical  information 
from  file  MCS/P  systan.  The  voice  mode  became  the  communication  channel  of  choice  for  time- 
critical  collaborative  exchanges  for  many  MSF  staff  members. 
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The  Impact  Of  Field  Of  View  On  The 
Performance  Of  Some  Infantry  Tasks 

Eugene  Dutoit 

D. Ayers,  F. Heller,  C. Holloway,  K. McDonald,  E. Redden 
Dismounted  Battlespace  Battle  Lab 
Fort  Banning,  Georgia  31905 

ABSTRACT 

The  Dismounted  Battlespace  Battle  Lab  (DBBL)  conducted  an 
experiment  to  determine  the  impact  of  field  of  view  (FOV)  of ^ 
image  intensification  night  sights  on  the  capability  of  a  unit 
and  individuals  to  perform  various  Infantry  related  tasks .  Three 
FOVs  were  investigated;  32,  40  and  60  degrees.  The  night  vision 
sights  were  all  monocular  and  mounted  on  the  soldiers  helmet. 

The  experimental  hypothesis  /  claim  was  that  if  the  soldier  is 
taught  proper  scanning  techniques,  the  narrow  field  of  view 
devices  will  provide  the  same  operational  capability  as  the 
larger  FOV  sights.  The  payoffs  for  using  the  smaller  FOV  goggles 
are;  reduced  weight  carried  on  the  helmet,  increased  image 
resolution  and  reduced  hardware  costs.  Data  were  collected  on  a 
variety  of  Infantry  tasks;  for  mounted  and  dismounted  operations. 
The  measures  of  effectiveness  (MOE)  were  based  on  unit  and 
individual  performance.  The  methods  of  data  analysis  were 
primarily  nonparametric,  however  parametric  methods  were  also 
used  and  the  "decisions"  resulting  from  these  two  approaches  were 
compared . 


The  purpose  of  this  paper  is  to  outline  the  analytic  methods 
applied  to  experimental  results  obtained  at  Fort  Benning,  Georgia 
and  compare  the  statistical  decisions  regarding  specific  measures 
of  effectiveness  (MOE)  using  nonparametric  and  parametric  _ 
methods.  The  purpose  of  the  experiment  was  to  determine  if  there 
were  any  differences  in  soldier  performance  of  some  Infantry 
tasks  when  the  field  of  view  (FOV)  of  monocular  night  vision 
goggles  (NVG)  is  varied  (32,  40  and  60  degrees) .  This  paper  will 
focus  on  the  analytical  results  obtained  for  each  of  the  MOE _ and 
present  the  pertinent  results  as  well  as  the  statistical  decision 
for  each  method  of  analysis . 

The  experimental  hypothesis :  if  the  soldier  is  taught  proper 
scanning  techniques,  the  narrow  FOV  devices  will  provide  the  same 
operational  capability  as  the  larger  FOV  devices.  If  this  is 
true,  then  the  payoffs  to  the  Army  will  include ;_ reduced  weight 
on  the  soldier's  helmet,  increased  image  resolution  and  reduced 
hardware  costs. 

The  system  characteristics  for  the  three  NVGs  are  presented 
on  the  next  page .  Note  that  the  weight  of  the  60  degree  system 
is  nearly  twice  that  of  the  two  other  alternative  systems . 


Approved  for  public  release;  distribution  is  unlimited. 
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Characteristic 

32  and.  40  Degree 

60  Degree 

Weight  with  batteries 

-95  Lbs 

2.1  Lbs 

Focus  range 

25  mm  to  infinity 

25  mm  to  infinity 

On-axis  resolution  at  optimum 
light  level 

1.3  CY/mr 

1.2  CY/mr 

Diopter  focus 

+2  to  -6  diopters 

+2  to  -2  diopters 

Exit  pupil 

10  mm  ®  25  mm  eye  relief 

12  mm  ®  20  mm  eye  relief 

SYSTEM  CHARACTERISTICS 


Overview  of  Training.  In  summary,  the  following  steps  were 
taken  to  train  the  test  subjects.  Each  of  the  subjects  was  an 
Army  soldier. 

1.  The  test  subjects  were  never  told  that  the  FOVs  for  the  three 
goggle  systems  were  different. 

2.  Each  subject  was  taught  how  to  focus  each  goggle  and  adjust 
the  head  harness . 

3.  Each  subject  walked  the  in-door  Night  Fighting  Test  Facility 
(NFTF)  at  Fort  Banning,  The  facility  has  lanes  established  for 
the  following  environments;  jungle,  woodland,  desert  and  urban. 
Each  lane  is  approximately  nine  feet  wide  and  45  feet  long. 

4.  Each  subject  also  received  additional  training  at  the  NFTF  in 
these  skills;  boresighting  each  goggle,  basic  maintenance  and 
firing  an  M16  rifle  from  the  standing,  foxhole  and  the  prone 
firing  positions. 

5.  Finally,  each  subject  went  to  the  out-door  Buckner  Range  and 
was  taught  the  preferred  scanning  techniques  to  use  during  the 
conduct  of  the  experiment . 

The  following  table  provides  the  Infantry  tasks  and  their 
related  measures  of  effectiveness  that  were  analyzed  in  this 
experiment . 


Cross  country  dismounted  movement 

1.  Number  of  navigation  errors 

2 .  Number  of  targets  found 

3 .  Navigation  exercise  time 

4.  Number  of  trips/stumbles 

Cross  country  vehicle 

1.  Motorcycle  exercise  time 

2.  Ranger  special  ops  vehicle;  exercise  time 

3 .  Number  of  cones  knocked  down 

Military  operations  in  urban  terrain  (MOUT) 
performance 

1.  Time  required  to  clear  a  room 

Target  engagement  performance 

1.  Fraction  of  target  detections 

2.  Fraction  of  targets  hit 

The  experimental  results  and  a  summary  of  the  statistical 
analysis  for  each  of  the  ten  measures  of  effectiveness  listed  in 
the  table  above  will  be  addressed  separately  in  this  paper.  The 
"statistical  tools"  used  to  analyze  the  experimental  data  are 
outline  in  the  table  below. 
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STATISTICAL  TOOLS 

* 

EXPLORATORY  DATA  ANALYSIS 

* 

PARAMETRIC  ANALYSIS  OF  VARIANCE 

* 

NONPARAMETRIC  (KRUSKAL  WALLACE)  ANOVA 

* 

CHI-SQUARE  (GOODNESS  OF  FIT  AND  CONTINGENCY) 

* 

LOG-LINEAR  ANALYSIS 

* 

BINOMIAL  PROBABILITY  CALCULATION 

Constraints  and  statements  concerning  the  experiment  and 
data  analysis: 

The  subjects  were  initially  assigned  at  random  to  each  of  the 
three  NVG  systems  and  the  use  of  the  goggles  by  the  subjects  was 
performed  in  a  counter  balanced  procedure.  However,  the 
following  list  of  caveats  applies  to  this  experiment  and  to  the 
results. 

a.  There  were  no  considerations  of  statistical  power.  This  is 
consequence  of  the  Advanced  Warfighting  Experiment  (AWE) 
philosophy  which  is  based  on  "looking  for  insights"  as  opposed  to 
probabilistic  decisions  concerning  experimental  hypotheses . 

b.  In  many  cases  data  were  obtained  on  tactical  units  instead  of 
on  individual  soldiers .  Unit  analysis  has  an  operational  flavor 
and  appeal  that  is  hard  to  argue  about.  However,  the  unit 
analysis  results  in  a  "small  sample"  size.  This  in  turn  biases 
the  experiment  in  favor  of  the  null  hypothesis  (ie,  no 
statistical  differences) . 

c.  An  examination  of  the  first  chart  of  this  paper  clearly 
indicates  that  this  experiment  was  a  comparison  between  "systems" 
and  FOV  performance  rather  than  a  pure  FOV  comparison.  The 
helmet  mounted  60  degree  FOV  system  was  nearly  twice  as  heavy  as 
the  other  two  helmet  mounted  systems  and  could  have  biased  the 
results  of  the  experiment . 

d.  Some  of  the  MOE  data  elements  were  more  subjective  than 
desired.  The  measures  of  the  "time  to  clear  a  room"  and  the 
count  of  the  number  of  "trips  and  stumbles"  were,  in  retrospect, 
rather  subjective. 

e.  There  was  an  unfortunate  vehicle  accident  during  the  course  of 
the  experiment.  As  a  result  a  soldier  suffered  a  broken  leg. 

This  incident  may  have  had  some  influence  on  the  results  of  the 
remaining  "vehicle"  exercises. 


RESULTS.  The  results  of  the  data  analysis  will  be  presented  for 
each  MOE.  It  was  not  appropriate  to  use  multivariate  methods 
because  many  of  the  MOE  had  small  numbers  of  observations. 
Nonparametric  methods  were  used  as  the  primary  means  of 
hypothesis  testing.  However,  in  several  cases,  the  alternative 
parametric  test  was  also  conducted  in  order  to  determine  if  the 
statistical  decision  was  conserved.  The  critical  level  of 
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significance  was  set  at  10%.  The  results  for  each  of  the  ten  MOE 
are  presented  in  the  tables  below. 


Task:  Cross  Country  Dismounted  Movement. 

MOE  1  is :  Number  of  navigation  errors . _ 

Data  obtained  on  a  tactical  team  basis.  Six  teams  or  six 
observations  per  FOV. 

Navigation  error  is  defined  as  being  off  course  by  greater 
than  five  degrees. 

Exploratory  analysis  indicated  one  outlier  which  was  then 
removed . 


FOV  (degrees)  32  40  60 

Average  results _ 1 . 5 _ ^83 _ 1.00 

Statistical  Decision:  Kruskal  Wallace;  P  =  .59 

ANOVA;  P  =  .51 


Task:  Cross  Country  Dismounted  Movement . 


MOE  2  is:  Number  of  targets  found. 

Data  obtained  on  a  tactical  team  basis, 
observations  per  FOV. 

Six  teams 

or  six 

MOE  based  on  the  number  of  targets  found 
course . 

on  the  navigation 

One  team  set  of  data  were  removed  because 
the  course . 

the  team 

was  far  off 

FOV (degrees)  32 

Average  results  1.25 

40 

2.00 

60 

1.25 

Statistical  Decision:  Kruskal  Wallace: 

ANOVA: 

P  =  .27 

P  =  .26 
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Task:  Cross  Country  Dismounted  Movement. 


MOE  3  is :  Time  to  complete 

exercise (seconds) 

Data  obtained  on  a  tactical 

team  basis.  Six 

teams 

or 

six 

observations  per  FOV. 

Exercise  time  is  defined  as 

the  time  it  takes  the 

unit 

to 

walk 

the  course . 

All  observations  included  in  the  analysis. 

FOV  (degrees) 

32 

40 

60 

Average  results 

85.2 

68.7 

79 

.8 

Statistical  Decision: 

Kruskal  Wallace; 

P  =  . 

39 

ANOVA; 

P  =  . 

49 

Task:  Cross  Country  Dismounted  Movement. 


MOE  4  is :  Number  of  trips  or 

stumbles . 

Data  obtained  on  a  tactical 

team  basis.  Six  teams 

or  six 

observations  per  FOV. 

The  number  of  trips  or  stumbles  were  recorded 

for 

each  unit . 

1  All  observations  were  included  in  the  analysis. 

FOV  (degrees) 

32 

40 

60 

Average  results 

6.5 

4.5 

5 . 8 

Statistical  Decision: 

Kruskal  Wallace; 

P  = 

.59 

ANOVA; 

P  = 

.88 

Task:  Cross  Country  Vehicle;  Motorcycle  Exercise  Time. 


1  MOE  5  is :  Motorcycle 

exercise  time  in  minutes. 

Data  obtained  per  motorcycle  operator.  Four  to  five  operators 
per  FOV. 

1  The  time  required  to 

navigate  the  vehicle  course  was  recorded.  || 

II  All  observations  were  included  in  the  analysis.  || 

FOV  (degrees) 
Average  results 

32  40  60 

17.2  14.8  20.6 

Statistical  Decision: 

Kruskal  Wallace;  P  =  .31 

ANOVA;  P  =  .20 
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Task:  Cross  Country  Vehicle;  Ranger  Special  Operations  Vehicle. 


Task:  Cross  Country  Vehicle;  Motorcycle  and  Ranger  Vehicles. 

MOE  7  is :  Number  of  cones  knocked  over  on  the  course  by  both 
motorcycles  and  the  Ranger  special  operations  vehicle. _ 

Data  obtained  for  each  operator.  Four  to  five  operators  FOV. 

The  data  were  collected/grouped  and  analyzed  according  to  the 
following  categories;  FOV  (three  levels;  32,  40,  60  degrees), 
type  of  vehicle  (two  levels;  motorcycle  and  Ranger)  and  side 
of  vehicle  which  hit  the  cone  (two  levels,  left  and  right) . 

All  data  included  in  the  analysis. 

These  are  the  enumerated  data : 


FOV 

Type  of  Vehicle 

side 

of 

Hit 

32 

40 

60 

Moto  RSOV 

Number  of  Cones  Knocked  Over 

Left 

Right 

34 

29 

.32 

19 

H 

O 

O 

O 

33 

,27 

49 

Hierarchical  Log  Linear  Results  are  as  follows: 

*  First  order  effects  are  adequate  to  explain  the  data. 

This  is  reflected  in  the  methodology  shown  above  for  each  of 
the  categories . 

*  The  "type  of  vehicle"  is  the  driving  factor.  This  is 
reflected  above  in  the  P  value  of  .00  for  type  of  vehicle. 

The  ranger  special  operations  vehicle  had  a  significantly 
greater  number  of  events . 

*  FOV  is  the  weakest  factor.  This  reflected  in  the  P  value 
of  .32  presented  above. 
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Task:  Military  Operations  in  Urban  Terrain. 

MOE  8  is:  Time  to  clear  a  room  (minutes)  . _ 

Data  obtained  on  a  tactical  team  basis.  Six  teams  or  six 
observations  per  FOV. _ 

MOE  is  determined  from  the  time  the  team  first  enters  a  room 
until  all  the  enemy  is  determined  to  be  killed.  This  is  a 
subjective  determination  by  the  subject  matter  experts  and  is 
considered  to  be  "weak" . _ _ _ 

Exploratory  analysis  indicated  one  outlier  which  was  removed. 

FOV (degrees)  32  40  60 

Average  results _ 1.26 _ 1 . 03 _ 1  •  15 _ 

Statistical  Decision:  Kruskal  Wallace;  P  =  . 83 

ANOVA;  P  =  .63  


Task :  Target  Engagement  Performance .  _ 

MOE  9  is:  Fraction  of  available  targets  detected. _ 

Data  were  obtained  for  each  individual  soldier.  Twenty 
soldiers  were  used  in  the  experiment.  The  use  of  each  FOV  by 
each  soldier  was  randomized.  It  was  assumed  that  each  shot  at 
a  "target”  equaled  a  detection.  There  were  twenty  target 
opportunities  per  soldier  per  FOV;  or  a  total  of  400  target 
opportunities  per  FOV. 

Information  on  false  detections  was  not  available. _ 

The  fraction  of  available  targets  detected  for  each  FOV  = 
Number  of  detections  (or  shots)  divided  by  400. _ 

All  data  were  included  in  the  analysis . _ 


The  results  for  each  FOV  are: 

Prob  Detection  for  FOV  of  32  degrees  =  .558. 

Prob  Detection  for  FOV  of  40  degrees  =  .558  (this  is  not  a 
typo)  . 

Prob  Detection  for  FOV  of  60  degrees  =  .543. _ 

Statistical  Decision:  There  was  no  statistically  significant 
difference  in  performance  between  the  three  FOVs.  There  was 
overlap  between  the  three  95  %  confidence  intervals  computed 
about  each  of  the  point  estimates  cited  above . 
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Task :  Target  Engagement  Performance . 


MOE  10  is:  Fraction  of  detected  targets  that  were  hit.  This 
was  a  different  range  than  discussed  above  for  MOE  9. 

Data  were  obtained  for  each  individual  soldier.  Twenty 
soldiers  were  used  in  the  experiment .  The  use  of  each  FOV 
system  by  each  soldier  was  randomized.  Each  soldier  was 
presented  forty  targets  per  FOV  system  in  this  target  rich 
environment.  Therefore  each  FOV  system  was  exposed  to, (20  X 
40),  800  target  opportunities  for  detection  and  engagement. 
The  targets  appeared  to  "pop-up"  at  random  but  they  were 
actually  programmed  to  appear  random . 


The  fraction  of  target  hits  for  each  FOV  =  Number  of  hits 
divided  by  the  Number  of  detections. 

All  data  were  included  in  the  analysis. 


The  results  for  each  FOV  are : 
Estimate 

Prob  Hit  for  32  degree  FOV  =  .487 
Prob  Hit  for  40  degree  FOV  =  .350 
Prob  Hit  for  60  degree  FOV  =  .367 


95%  Confidence  Interval 
.452  --  .522 

.318  --  .382 
.333  --  .400 


Statistical  Decision:  The  probability  of  hit  for  the  32 
degree  FOV  system  is  greater  than  the  other  two  systems. 


SUMMARY.  The  results  for  each  of  the  ten  MOE  were  presented  in 
tables  above  and  should  be  able  to  speak  for  themselves.  The 
following  list  of  results  is  intended  to  be  a  summary  of  the 
results  and  conclusions  across  all  of  the  MOE.  Some  of  these 
general  statements  have  already  been  discussed. 

1.  The  experimental  hypothesis  of  equal  effectiveness  using  the 
different  FOV  systems  for  the  selected  Infantry  tasks  is 
supported.  There  was  no  statistically  significant  difference  in 
performance  for  nine  out  of  ten  tasks.  In  MOE  ten,  the 
probability  of  hit  for  the  32  degree  FOV  system  was  statistically 
better  than  for  the  other  two  systems.  However,  it  needs  to  be 
repeated  that  the  small  samples  (on  a  unit  or  tactical  team 
basis)  will  result  in  low  statistical  power.  It  was  also  evident 
that  there  were  physical  differences  between  the  systems  (such  as 
weight)  and  pure  FOV  was  confounded  with  other  system  parameters. 

2 .  Although  there  was  only  one  case  were  the  differences  in  FOV 
performance  were  statistically  significant  (MOE  10) ,  a  careful 
examination  of  the  tables  for  each  MOE  shows  that  the  40  degree 
FOV  system  is  the  "best,  or  tied  for  best",  eight  times  out  of 
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ten  (Reference  MOE  1,2, 3, 4, 5, 6, 8  and  9).  The  probability  that 
any  single  one  of  these  FOV  systems  would  be  "best,  or  tied  for 
best",  eight  out  of  ten  times  under  the  null  hypothesis  of  no 
difference  in  performance  is  .003.  This  is  a  rather  interesting 
result.  The  40  degree  FOV  system  is  the  same  weight  as  the  30 
degree  FOV  system  and  should  be  more  comfortable  to  wear  on  the 
helmet  than  the  larger  60  degree  FOV  system. 

3.  The  60  degree  FOV  system  was  "best"  for  MOE  7;  fewer  cones 
were  knocked  down  on  the  driving  course  using  this  system. 

4 .  The  Ranger  special  operations  vehicle  was  involved  in  a 
significantly  greater  number  of  events  (reference  MOE  70. 

5.  The  statistical  decisions  were  consistent  when  both 
nonparametric  and  parametric  methods  were  applied  to  the  data. 
This  result  is  not  surprising  when  the  small  sample  sizes  are 
considered. 
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ABSTRACT 

Decentralized  battlefield  command  and  control  requires  reliable  and  timely  distribution  of  information.  At 
present,  distribution  of  digital  information  is  limited  by  the  low-bandwidth  noisy  channels  inherent  to  combat  net 
radios  and  heavy  traffic  demands,  forcing  commanders  to  make  decisions  from  less  than  timely  information.  In  the 
ideal  communications  network,  each  node  would  be  smart  enough  to  monitor  network  performance  and,  when 
necessary,  adapt  itself  to  make  better  use  of  the  available  bandwidth.  The  adaptive  network  node  would  employ  a 
decision  algorithm  to  modify  configuration,  routing  and  protocol  parameters  based  on  measured  network  performance 
statistics  and  system  requirements.  Our  research  addresses  the  effects  of  noise  and  interference  on  communications 
channels  and  construction  of  network  protocols  that  will  be  effective  on  the  modem  battlefield.  The  approach 
emphasizes  use  of  actual  hardware  and  controlled  experimentation  to  explore  alternative  protocols.  This  paper 
describes  a  controlled  laboratory  experiment  in  which  messages  were  passed  over  a  communications  network  using 
the  combination  of  the  Fact  Exchange  Protocol  (FEP),  the  Tactical  Data  Buffers  (TDBs)  and  Single  Channel  Ground 
and  Airborne  Radio  System  (SINCGARS)  Combat  Net  Radios  (CNRs).  It  also  describes  the  suite  of  software  to 
automatically  execute  the  test  design,  and  collect  and  apply  preliminary  data  reduction  procedures  to  baseline 
performance  data  for  the  prototype  communications  network. 

BACKGROUND 

The  primary  means  of  communications  at  low-echelon  fighting  units  has  been  and  continues  to  be  voice  data 
transmitted  by  CNRs.  Gradually,  a  requirement  for  digital  data  transmission  is  being  inserted  into  the  mission  profile. 
Digital  transmissions  allow  for  compression  and  forward  error  correction  and  provide  the  ubiquitous  computer  with 
the  information  it  requires.  With  this  increasing  requirement  for  digital  transmissions,  problems  arise. 

Modem  combat  net  radios  are  typically  line— of— sight.  Frequency  Modulation  (FM),  low  power  instruments  de¬ 
signed  specifically  for  use  at  short  range.  Their  bandwidth  is  very  limited,  typically  1200-2400  bits  per  second  (bps), 
although  recent  improvements  in  modem  technology  have  pushed  these  numbers  as  high  as  16  kilobits  per  second 
(kbps).  These  radios  are  commonly  assembled  into  a  single  hop  network  of  6  to  12  users.  Their  effective  use  to  date 
is  testimony  to  the  redundancy  of  the  human  language  and  the  ability  of  the  human  brain  to  extract  meaningful  data 
from  a  noisy  signal. 

Our  research  addresses  the  effects  of  noise  and  interference  on  communications  channels  and  constmction  of  net¬ 
work  protocols  and  procedures  that  will  minimize  delay  and  maximize  throughput  on  the  modem  battlefield.  The 
networks  that  are  of  particular  interest  to  us  have  nodes  with  high  computing  power  but  weak,  noisy,  shared  commu¬ 
nications  links.  For  this  reason,  our  approach  to  communications  emphasizes  intelligent  processing  at  each  node  to 
limit  the  amount  of  information  that  must  be  passed  along  the  communications  channel.  Each  node  is  assumed  to  act 
independently  to  improve  the  effectiveness  of  the  information  exchange  between  nodes.  Such  a  system  of  controls 
requires  that  each  node  be  able  to  monitor  the  network  traffic;  decide  whether  performance  is  inadequate;  and,  if  so, 
make  an  appropriate  adjustment  to  the  protocol. 

A  series  of  controlled  experiments  is  being  conducted  to  determine  which  communications  protocol  parameters 
and  stmctural  assumptions  have  the  greatest  impact  on  selected  performance  measures.  To  accomplish  such  an  objec¬ 
tive,  it  is  required  that  a  group  of  computers  serving  as  battlefield  nodes  be  synchronized,  network  parameters  be  ini¬ 
tialized  prior  to  each  run,  and  collected  data  be  made  conveniently  accessible  to  the  user.  As  a  result,  software  that 
performs  the  necessary  tasks  with  minimal  user  intervention  was  developed. 

TEST  CONFIGURATION 

There  are  three  nodes,  each  of  which  is  a  SPARCbook  3.*  Each  contains  a  communications  protocol  and  a  scenar¬ 
io  driver.  The  communications  protocol  includes  data  collection  functions  to  log  the  sending  and  receipt  of  messages 
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and  acknowledgements  (ACKs)  as  well  as  information  on  the  queues.  The  scenario  driver  provides  the  communica¬ 
tions  loading.  The  nodes  are  connected,  via  ethemet,  to  a  SPARCstation  20  ^  that  serves  as  the  data  storage  and  control 
node.  The  nodes  are  connected  to  SINCGARS  CNRs  via  TDBs,  a  modem  between  the  radios  and  the  terminal  equip¬ 
ment.  Resistor  loads  are  used  as  antennas  to  reduce  the  transmission  range. 

The  TDB  interfaces  with  the  computer  using  RS-232C,  and  with  the  SINCGARS  using  MIL-STD-1 88(C).  Two 
processing  steps  are  performed  to  input  data  to  the  TDB:  1)  any  formatting  bits,  such  as  start,  stop,  and  parity,  are 
removed  so  that  transmission  time  is  not  expended  by  unnecessary  data;  2)  the  data  are  stored  until  the  TDB  can  access 
the  network.  The  ^storage  capacity  is  24  kilobytes.  Storing  the  input  data  avoids  collisions  between  incoming  and 
outgoing  data. 

The  TDB  may  process  the  data  to  be  sent  in  a  number  of  ways  depending  upon  the  setting  of  various  internal  and 
front  panel  switches.  In  the  simplest  mode  nothing  is  done  to  the  data  and  it  is  output  at  the  raw  data  rate  of  the  TDB 
of  16  kbps.  The  simplest  processing  that  can  be  selected  uses  the  Bose-Chandhuri-Hacquenghem  (BCH)  protocol 
for  error  detection/correction.  Characters  are  coded  in  4  byte  groups  at  a  48/32  rate.  In  other  words,  each  32  bit  or 
4  character  block  becomes  48  bits  after  encoding.  This  encoding  reduces  the  effective  throughput  to  10.66  kbps.  Final¬ 
ly,  three  modes  of  forward  error  correction  may  be  requested.  This  error  correction  algorithm  consists  of  retransmitting 
multiple  copies  of  the  data.  The  first  setting  causes  no  forward  error  correction  to  be  done,  i.e.,  the  data  is  sent  once 
and  the  effective  throughput  is  still  1 0.66  kbps.  The  next  setting  causes  the  data  to  be  repeated  5  times  and  interleaved 
in  a  manner  designed  to  spread  out  burst  errors.  The  effective  throughput  at  this  level  of  redundancy  is  2.133  kbps. 
The  last  setting  causes  the  data  to  be  repeated  13  times  resulting  in  an  effective  throughput  of  820  bps.  Forward  error 
correction  with  a  redundancy  of  5  was  selected  for  this  experiment. 

The  receiving  TDB  performs  the  appropriate  level  of  de-interleaving.  In  those  cases  where  data  is  repeated,  it 
uses  majority  voting  to  resolve  differences  between  redundant  blocks,  and  does  BCH  decoding  resulting  in  a  block 
of  4  characters.  If,  for  any  reason,  the  characters  cannot  be  identified,  the  damaged  4  character  block  is  replaced  in 
the  output  stream  with  the  four  characters  ”@@@@”.  The  data  are  then  passed  to  the  storage  buffer  where  formatting 
bits  are  reinserted  and  then  output  on  the  RS-232C  line  to  the  data  processing  device.  For  more  details  refer  to  Harris.^ 

Figure  1  illustrates  the  test  configuration. 
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SOFTWARE  CONFIGURATION 


The  software  consists  of  four  parts:  the  test  driver,  the  data  reduction  software,  the  scenario  driver,  and  the  com¬ 
munications  software. 

The  test  driver  is  a  menu-driven  user  interface  written  in  the  C  programming  language,^  uses  X  Windows^  and 
Motif,^  and  runs  under  a  UNIX"^  operating  system.  It  coordinates  all  tasks  necessary  to  execute  the  experimental  de¬ 
sign.  Prior  to  the  test  driver  existence,  the  experimental  design  for  similar  tests  was  executed  manually,  requiring  extra 
time  for  setup  and  the  possibility  of  errors  during  the  initialization  phase  of  a  test  cell. 

Among  its  tasks,  the  test  driver  generates  messages  for  the  scenario  driver,  updates  the  factor-level  combinations, 
distributes  the  information  to  the  nodes,  and  synchronizes  the  nodes’  clocks.  In  addition,  it  starts  and  ends  each  test 
cell,  retrieves  all  log  files  from  the  remote  nodes  for  storage  on  the  control  node,  and  computes  network  statistics. 
To  minimize  input  errors,  the  test  driver  runs  all  experimental  combinations  without  human  intervention.  The  software 
is  capable  of  executing  independent  replications  of  the  design  matrix  automatically,  with  each  replication  using  differ¬ 
ent  random  numbers,  starting  in  the  same  initial  state,  and  all  statistical  counters  reset  to  zero. 

The  test  driver  reads  information  contained  in  text  files  to  initialize  values  that  may  vary  depending  on  the  exper¬ 
imental  design.  These  text  files  contain  values  that  need  mitialization  prior  to  the  test  cell  such  as:  factors  and  levels 
of  interest;  the  number  of  replicates  for  each  test  cell;  the  number  of  replicates  for  the  center  point;  the  random  number 
seeds  to  generate  the  desired  message  sets  or  scenarios;  the  number  of  tries  for  each  message;  node  identification 
string;  and  the  length  of  each  run.  Other  values  that  are  initialized  are  the  names  of  the  directories  into  which  the  soft- 
ware'^will  store  the  data,  the  directories  where  executable  binary  files  are  located,  and  values  that  are  used  by  the  data 
reduction  software.  The  text  files  used  for  initialization  may  be  modified  either  by  editing  the  files  prior  to  running 
the  test  driver  or  by  menu  selection  before  executing  the  experimental  design. 

The  communications  and  scenario  driver  software  on  the  remote  nodes  have  their  own  input  files;  these  also  need 
to  be  updated  prior  to  each  test  cell.  The  control  node  has  a  copy  of  these  input  files,  referred  to  as  template  files,  which 
the  test  driver  updates  and  copies  onto  the  remote  nodes.  Template  files  are  used  whenever  part  of  a  file  needs  to  be 
modified  more  than  one  time  during  the  test  run.  Examples  of  this  kind  of  file  are  the  capabilities  input  file  (cif_node- 
name)  loaded  by  the  communications  software  to  initialize  the  nodes’  id,  the  window  size  and  retry  time-out  (Figure 
2a),  and  the  nodename##  file  from  which  the  scenario  driver  gets  the  message  information  to  load  messages  into  the 
communications  software. 

The  test  driver  invokes  UNIX  shell  procedures  to  execute  tasks  on  the  remote  node  such  as  synchronizing  clocks, 
starting  and  ending  the  execution  of  a  test  cell  (Figure  2b),  as  well  as  on  the  control  node,  such  as  copying  files  to  the 
remote  nodes  (Figure  2a)  and  retrieving  log  files  from  the  remote  nodes. 

During  the  execution  of  a  test  cell,  each  node  collects  data  in  a  log  file  local  to  that  node.  The  log  files  contain 
time  tagged  information  on  the  messages  and  ACKs  sent  and  received,  as  well  as  information  on  queues.  The  data 
reduction  software  is  a  set  of  C  programs  that  reformats  log  files  and  computes  network  statistics.  The  test  driver 
executes  UNIX  shell  procedures  to  invoke  the  data  reduction  software.  The  shell  procedures  that  contain  node  in¬ 
formation  are  updated  using  template  files.  The  output  of  the  data  reduction  software  is  formatted  in  a  fashion  suitable 
for  statistical  analysis. 

The  scenario  driver  is  a  C  language  application  that  reads  a  file  of  time  tagged,  preformatted  message  strings 
and  forwards  them  to  the  DFB  at  the  appropriate  times. 

The  communications  software  is  a  C  language  application  composed  of  a  freeform  database  management  system 
called  the  Distributed  FactBase  (DFB),  which  communicates  with  the  other  DFBs  via  the  FER  An  important  concept 
implemented  in  the  DFB  is  the  ability  to  automatically  initiate  predefined  actions  (rules)  upon  receipt  of  new  informa¬ 
tion.  These  rules  ensure  that  only  significant  data  (as  defined  by  the  commander  and  staff)  are  transmitted.^  The  FEP 
is  a  tactical  transport  layer  protocol  that  communicates  information  quickly,  concisely,  and  reliably  over  unreliable, 
low-bandwidth  CNRs.  It  is  designed  to  be  a  connectionless,  reliable  protocol  (guarantees  delivery  of  messages  within 
certain  parameter  limits)  that  utilizes  multicast,  overhearing,  and  other  techniques  to  minimize  radio  transmissions.^ 
A  data  collection  function  is  provided  by  the  DFB  to  log  information  on  messages,  including  ACKs,  transmitted  and 
received. 
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Figure  3  illustrates  the  software  configuration. 


a.  Update  and  copy  of  input  file  to  the  remote  node.  b.  Update  of  a  shell  procedure  to  start  a  process  on  the 

remote  node. 


Figure  2.  Template  File  with  Shell  Procedure  Interaction 
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Figure  3.  Software  Configuration 
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EXPERIMENTAL  DESIGN  AND  ANALYSIS 


EXPERIMENTAL  DESIGN 

Experimental  design  provides  a  means  of  deciding  before  any  runs  are  made  which  particular  configurations  to 
examine  so  that  the  desired  information  can  be  collected  with  the  least  amount  of  testing.  Carefully  designed  experi¬ 
ments  are  much  more  efficient  than  a  “hit-or-miss”  sequence  of  runs  in  which  a  number  of  alternative  configurations 
are  unsystematically  tried  just  to  see  what  happens. 

When  the  number  of  factors  is  moderate,  a  factor-screening  strategy,  such  as  a  factorial  design,  might  be  able 
to  indicate  which  factors  appear  to  be  important,  and  more  to  the  point,  which  factors  are  irrelevant  and  can  be  simply 
fixed  at  some  reasonable  level  and  omitted  from  further  consideration.  The  software  developed  in-house  currently 
supports  the  fully  automated  execution  of  a  modified  2^  factorial  design.  The  four  factors  selected  for  testing,  retry 
time-out  interval,  window  size,  message  arrival  rate,  and  message  length,  are  ones  that  can  be  easily  modified. 

Two  levels  of  each  factor  were  tested  with  each  of  2  levels  of  every  other  factor  yielding  16  test  combinations. 
The  levels  of  each  factor  are  listed  below: 

1 .  Retry  time-out  (time  in  seconds  a  host  waits  for  an  ACK  before  retransmiting  the  mes¬ 
sage) 

10 

40 

2.  Window  size  (number  of  messages  allowed  to  be  sent  per  host  without  waiting  for  an 
ACK) 

8 

50 

3.  Message  arrival  rate  (per  one  hour  test  cell) 

200  per  node 

600  per  node 

4.  Message  length  (in  characters) 

80 

240 

Past  experimentation  with  actual  hardware  and  a  tactical  communications  protocol  illustrated  that  network  be¬ 
havior  is  nonlinear  in  nature.'®  A  potential  concern  with  the  use  of  two— level  factorial  designs  is  the  assumption  of 
linearity  in  the  factor  effects.  That  is  not  to  say  that  a  2*^  system  requires  perfect  linearity  -  this  system  works  quite 
well  even  when  the  linearity  assumption  holds  only  very  approximately.  However,  to  provide  protection  against  antici¬ 
pated  curvature  in  the  response  data,  the  2^  design  was  augmented  with  five  center  points  (corresponding  to  a  retry 
time-out  of  25  seconds,  a  window  size  of  29,  an  arrival  rate  of  400  messages  per  node,  and  a  message  length  of  160 
characters).  The  entire  experimental  design  was  replicated  three  times. 


RTRNRAUM-HAT.T.  TEST  FOR  DIFFERENCES  AMONG  NODES’  TIME  TO  SUCCESS 

We  wish  to  determine  whether  the  distribution  functions  for  the  time  to  success  data  for  the  three  experimental 
nodes  are  identical,  especially  in  light  of  the  fact  that  the  hardware  representing  one  of  the  nodes  was  equipped  with 
greater  memory.  The  Bimbaum-Hall  test  has  been  selected  for  several  reasons:  the  data  consist  of  exactly  three  inde¬ 
pendent  samples,  each  of  size  n  =  63;  the  random  variable,  time  to  success,  is  continuous  making  this  an  exact  test; 
and,  most  importantly,  the  test  is  consistent  against  all  alternatives." 

The  null  hypothesis  is  that  there  is  no  difference  in  the  probability  distributions  of  time  to  success  among  the  three 
nodes,  and  the  alternative  is  that  a  difference  exists  between  at  least  two  of  the  distributions.  Although  not  shown  here, 
the  greatest  vertical  distance  between  any  two  of  the  empirical  distribution  functions  occurs  at  a  time  to  success  of 
304ri  seconds.  This  distance  is  3/63  =  .0476.  The  critical  region  of  size  a  =  .05  corresponds  to  all  values  of  the  test 
statistic  greater  than  .2948,  the  large  sample  approximation  for  the  .95  quantile  from  tables  for  the  Bimbaum-Hall 
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statistic  for  n  >  40.  Therefore,  there  is  not  sufficient  evidence  to  reject  the  null  hypothesis,  and  we  conclude  the  nodes 
do  not  differ  with  regard  to  the  probability  distributions  of  time  to  success. 

Given  the  nodes  appear  to  exhibit  similar  response  behavior,  and  that  performance  of  an  individual  node  is  not 
of  singular  interest,  the  data  for  the  individual  nodes  will  be  combined  into  a  single  collective  set  for  further  exploratory 
analysis. 

EXPLORATORY  DATA  ANALYSIS 

Graphics  is  both  a  powerful  exploratoiy  data  analysis  tool  for  obtaining  insight  into  the  structure  of  data  and  a 
diagnostic  tool  for  confirming  assumptions  or,  when  assumptions  are  not  met,  for  suggesting  corrective  actions. 

Many  important  properties  of  the  distribution  of  a  data  set  are  conveyed  by  the  quantile  plot,  including  the  median, 
quartiles,  interquartile  range,  and  other  quantiles  of  interest,  as  well  as  information  about  the  local  density  of  the  data 
and  symmetry. 

A  preliminary  look  at  the  aggregate  set  of  the  nodes'  time  to  success  data  is  provided  by  the  quantile  plot  in  Figure 
4.  For  this  empirical  data  set,  we  see  that  the  median  is  about  70  seconds  and  that  a  large  fraction  of  the  observed  values 
lies  between  25  seconds  and  100  seconds.  The  longest  time  to  success  is  in  the  neighborhood  of  900  seconds,  with 
a  total  of  36  observations  greater  than  200  seconds. 

The  data  exhibit  remarkably  flat  behavior  below  the  .82  quantile,  indicative  of  the  local  density,  or  concentration, 
of  the  data.  This  is  revealed  on  the  quantile  plot  by  the  string  of  nearly  horizontal  points. 

The  quantile  plot  may  also  be  used  to  examine  the  data  set  for  symmetry.  If  the  data  were  symmetric  the  values 
in  the  upper  portion  of  the  plot  would  stretch  out  toward  the  upper  right  quadrant  in  the  same  fashion  as  the  values 


in  the  lower  half  stretch  out  toward  die  lower  left  quadrant.  The  observations  in  Figure  4  are  skewed  toward  large 
values.  Small  values  are  tightly  packed  together;  the  large  values  stretch  out  and  cover  a  much  wider  range  of  the 
measurement  scale.  The  skewing  increases  dramatically  as  we  go  from  small  to  large  values,  resulting  in  a  strongly 
convex  pattern.  This  is  anticipated  with  network  delay  data. 

Figure  5  displays  the  frequency  of  messages  acknowledged  as  a  function  of  try  number  for  all  63  test  cells.  The 
communications  protocol  dictates  that  once  a  message  is  sent,  if  it  is  not  acknowledged  it  is  retransmitted.  Each  trans¬ 
mission  was  considered  a  ’’try”.  In  this  experiment,  the  protocol  was  configured  to  retransmit  up  to  two  times,  yielding 
a  total  of  three  possible  tries  to  transmit  one  message.  The  message  was  discarded  if  an  ACK  was  not  received  after 
three  tries. 

From  Figure  5,  one  can  see  that  more  than  50%  of  the  messages  either  failed,  i.e.,  not  acknowledged  within  3  tries, 
or  were  never  transmitted  due  to  the  window  size  being  full,  causing  the  messages  to  literally  be  trashed.  The  trend 
exhibited  by  this  distribution  of  messages  is  a  mirror  image  of  what  should  be  generated  by  a  network  process  under 
control.  The  information  extracted  from  this  plot  was  enough  to  warrant  further  investigation  of  the  FEP  and  the  DFB 
and  halt  further  testing  and  analysis. 


Number  of  Tries 

Figure  5.  Distribution  of  messages  by  try  number. 
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SUMMARY 


The  major  problem  identified  by  the  pilot  test  was  the  FEP’s  failure  to  match  outstanding  messages  with  returning 
ACKs.  This  problem  arose  only  when  several  messages  were  awaiting  ACKs  and  resulted  in  the  number  of  outstanding 
messages  growing  until  eventually  the  window  size  was  exceeded.  This  failure  to  match  ACKs  had  two  effects:  1) 
each  message  was  transmitted  the  maximum  number  of  retries  greatly  reducing  total  throughput;  2)  once  the  window 
size  was  exceeded  all  transmissions  were  stopped. 

The  template  files  are  useful  in  simplifying  the  programmer’s  job  when  the  experimental  configuration  requires 
modification.  Their  use  allows  fast  and  easy  modification  to  the  experimental  configuration  since  the  input  is  not 
“hard  wired”  into  the  code.  For  instance,  if  the  number  of  nodes  needs  to  be  increased  or  decreased,  the  programmer 
modifies  the  input  text  files  containing  node  information  and  the  updates  on  the  remote  software  take  place  during 
the  test  driver  initialization  phase. 

Because  the  test  driver  is  of  a  general  nature,  it  can  be  used  in  a  variety  of  situations  to  run  experiments  in  a  distrib¬ 
uted  UNIX  environment. 

It  is  anticipated  that  future  experiments  can  be  automated  to  consider  more  complex  communications  protocol 
modifications.  Automating  the  process  reduces  the  chance  of  operator  error  and  simplifies  the  execution  of  the  exper¬ 
imental  design. 
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ABSTRACT 

Continuous  data  lend  themselves  easily  to  grq)hical  display,  but  analogous  displays  for  diKTete  data  such  ^ 
hit/miss  data  are  not  so  readily  available.  Nominal  logistic  regression  can  produce  an  estumte  of  success  from  the 
underlying  regression  model  for  each  cell  in  the  underlying  contingency  table.  Since  logistic  regression  uses 
maxim^  likShood  to  fit  logarithms  of  odds  ratios,  the  estimates  pr^uced  are  strictly  betw^n  ^ 
for  cells  with  only  one  observation.  Thus  the  underlying  model  enables  one  to  transfoim  discrete  data  mto  more 
nearly  continuous  “synthetic  proportions”  for  analysis.  Ordinary  least  squares  regression  &en  be  ^  to 
manipulate  estimates  into  useful  marginal  estimates.  Alternatively,  graphical  medic^  such  as  dotplote  or  boxplote 
can  usefully  di^lay  distributions  of  the  synthetic  proportions.  Examples  from  small  arms  hit/miss  data  are  used  to 
illustrate  promising  techniques. 


INTRODUCTION 

This  paper  grew  out  of  work  done  early  in  1996  at  the  U.S.  Army  Test  and  Experimentation  Conunand 
(TEXCOM),  Experimentation  Center  (TEC),  Fort  Hunter  Liggett,  California.  A  proposed  new  si^tmg  device  for 
the  M16  rifle  and  M4  carbine  was  tested.  This  device  was  supposed  to  unprove  the  ^eed  at  which  soldiers  could 
fire  on  targets  without  degrading  the  probability  of  hitting  the  targets.  ITie  experiment  w 

sort  of  design  but  with  lots  of  nesting.  The  sighting  device  had  shghtly  different  v^ions  for  ibe  mb  and  JM4 
WEAPONS  and  the  standard  “iron”  sight  was  included  as  a  baseline.  Two  different  r^ufecturw  submittt 
candidates  giving  a  total  of  three  SIGHTS  (CANDA,  BASELINE,  and  CANDB),  considered  to  be  nested  m 
WEAPON.  Twenty  soldiers  (ROSTER)  executed  six  firing  tables  (TABLE,  l^els  TAB1-TAB6  but  areally 
correspondiig  to  NBC,  wide  view,  standard  record  fire,  etc.)  which  consisted  of  firing  rounds  ^ 
range  bands  (RANGE,  bands  from  50  m  to  300  m)  which  varied  by  TABLE  (therefore  nestmg  R^GE  m  ™LE). 
Soldiers  fired  a  total  of  18,960  shots  (including  some  multiple  shots  at  the  same  targets)  under  3890  combmatiom 
of  TABLE  RANGE,  ROSTER,  WEAPON,  and  SIGHT.  Between  one  and  eleven  shots  wrae  fired  under  eat* 
combination  of  conditions,  and  both  times  of  shots  (from  audio)  and  number  of  Wte  w^ 
combination  of  conditions.  Analysis  of  the  time  data  was  relatively  easy  using  ordi^  Analysis  of 
(ANOVA)  and  in  die  end  the  analysis  could  be  easily  displayed  using  boxplots  without  evOT  refeimg  to  e 
ANOVA  (^  Figure  1).  That  analysis  will  not  be  discussed  fi^er  in  this  paper.  Instead,  this  paper  discusses 
analyses  of  the  hit/miss  data  which  also  yield  graphical  presentations. 

ANALYSIS 


Figure  2  shows  a  simple  attempt  to  produce  boiqilots  of  hit/miss  data.  Evot  though  the  honzra^  plot 
position  of  each  data  point  is  “fuzzed”  by  adding  random  error  to  alleviate  overplotting,  the  plot  is  unhelpful  with 
this  much  data.  Thus  a  more  sophisticated  approach  is  needed. 

Nominal  logistic  regression  is  an  analog  to  ANOVA  for  hit/miss  data.  The  “odds  ratio”  (ODDR)  corresponding 
to  a  test  condition  is  the  ratio  of  the  probability  of  hit  to  the  probability  of  miss  under  that  condition;  that  is. 


ODDR  =  Prob[Y=Hit]/Prob[Y=Miss]. 


(I) 


In  this  paper,  ODDR  is  also  used  empirically  and  somevriiat  ambiguously  to  refer  to  the 

of  hits  totoe  proportion  of  misses  under  a  particular  condition.  The  logarithm  of  ODDR,  ln(ODDR)  has  the  mce 
symmetric  and  asymptotic  properties  desirable  for  classical  linear  model  building. 


ln(l/ODDR)  =  -ln(ODDR) 
ln(ODDR)  -»  -«  as  Prob[Y=Hit]  0 
ln(ODDR)  ~  as  Prob[Y=Hit]  1. 


Approved  for  public  release,  distribution  is  unlirmted. 


35 


to 

Fk9 

4 


4 

t 

5 

j 


1  :  • 

•  • 

CA.NDA  BASaWE  CA.MDB 

CA.NDA  BASaiNE  CANDB 

M16 

M4 

Rifle 

Carbine 

Figure  1.  Boxplots  of  firing  times. 


Nominal  logistic  regression  iteratively  fits  a 
multiplicative  model  for  changes  in  ODDR  through  the 
loglinear  model  ln(ODDR)=xp.  Two  simple 
exponentied  formulas  dien  let  one  get  back  to  estimates 
of  hit  and  miss  probabilities  fi'om  the  estimates  of 
ln(ODDR): 


CAND.A 

BASEUNE 

CAMDB 

CANDA 

BASEUNE 

CANDB 

we 

me 

IM 

Carbine 

Figure  2.  Boxplots  of  hit  proportions 
(not  very  helpful). 


Prob[Y=Miss]  =  1/(1  + 

Prob[Y=Hit]  =  e'^°'®'‘V(l  + 

In  this  p^er,  nominal  logistic  regression  is  used  to  analyze  the  hit/miss  data  using  the  model  reflected  in  Table 
1.  Because  of  file  relatively  complicated  nesting,  the  fector  of  interest  (SIGHT[WEAPON])  could  not  even  start  to 
be  addressed  until  the  odier  more  influential  fectors  of  TABLE,  RANGE  and  ROSTER  and  their  interactions  (ot 
non-mt^ctioM)  wth  WEAPON  had  been  accounted  for.  Not  surprisingly  with  such  a  large  amount  of  Hatg  some 
“statistically  significant”  effects  involving  SIGHT  turn  up,  but  they  are  clearfy  small  ccxnpared  to  those  of  the  more 


Table  1.  Statistical  Summary  of  Hit  Performance 


_ Source _ 

DF 

CaiiSa 

Prob>ChiSa 

ChiSa  ner 

TABLE 

5 

187.73 

0.0000 

37.5 

RANGEfTABLE] 

26 

2269.29 

0.0000 

87.3 

ROSTER 

19 

550.70 

0.0000 

29.0 

WEAPON 

1 

0.04 

0.8399 

0.0 

TABLE*WEAPON 

5 

4.63 

0.4631 

0.9 

WEAPON*RANGElTABLE] 

26 

33.94 

0.1366 

1.3 

WEAPON*ROSTER 

19 

65.00 

0.0000 

3.4 

SIGHTCWEAPON] 

4 

3.76 

0.4401 

0.9 

TABLE*SIGHT[WEAPON] 

20 

46.03 

0.0008 

2.3 

SIGHT*RANGE[TABLE,WEAPON] 

104 

148.53 

0.0027 

1.4 

Multiway  contingency  table  analysis  was  performed  on  hit/miss  data  (18,960  shots)  ii^'ng  nominal 
logistic  regression  as  implemented  in  fire  SAS®  JMP®  statistical  package  (version  3.1).  The  overall 


36 


influential  factors.  Nevertheless,  some  of  the  ^parent  effects  could  be  operationally  important.  If  the  data  were 
continuous  a  likely  next  step  would  be  to  look  at  the  Least  Squares  Means  (LSMs)  because  those  are  what  are  reaUy 
being  tested  in  Table  1.  Fortunately,  an  analog  to  LSMs  for  count  data  can  easily  be  obtained  from  computer 
statistics  packages  such  as  the  SAS®  JMP®*  package  which  was  used  for  this  analysis. 


AN  ANALOG  TO  LEAST  SQUARES  MEANS 

Once  the  fit  ln(ODDR)=XP  is  obtained  from  logistic  regression,  one  can  theoretically  string  the  vector  P  of 
model  parameter  estimates  together  just  like  SAS  PROC  GLM  or  JMP  does  in  OLS  re^ssion  to  obtain  LSM’s.  If 
you’ve  ever  tried  to  do  that,  you’ve  undoubtedly  found  that  a  computer  does  a  lot  better  job  than  a  person.  Luckily, 
JMP  offers  to  produce  and  retain  estimates  of  ln(ODDR)  for  each  of  the  3890  rows  in  &e  imderlying  contingency 
table  which  in  this  case  produces  the  estimates  portrayed  in  the  giant  linear  combination  in  Figure  3.  Call  this 


TABLE  : 

WEAPON  : 
0.23895886.  wtei  *7^16- 
“59.  when  “NM"  ’ 
otherwise 
Thatdi  WEAPON  : 
0.17011561.  ^dKO-Mie” 
-0.1701156,  when  *144”  • 
otherwise 
WEAPON  : 
-a0299096.  wbea  *1416” 
0.02990956,  when  *‘M4”  ’ 
otherwise 
WEAPON  : 
-O.QS17094.  when  “1416” 
0.05170944.  when  “M4”  * 
otherwise 
’match  WEAPON  : 
-0.4134447,  wten  •*M16” 
0.41344475.  when  “M4”  • 
otherwise 
WEAP<»i  : 
0.08598928.  when  1416” 
^.0859893.  wto  “M4”  * 
otherwise 


< 


III 


Figure  3.  Example  of  hi(ODDR)  estimates  for  each  row  of  the  underlying  contingency  table. 


vector  of  estimates  X,  and  consider  what  happens  when  OLS  regression  is  used  to  fit  X  using  the  same  X  model 
used  to  obtain  X.  Except  for  computational  error,  the  fit  will  be  perfect  since  OLS  regression  is  simply  tmdoing  die 


Table  2.  LSMs  from  OLS  regression  on  ln(ODDR)  using  the  same  model  as  in  logistic  regression  (extract). 


Level 

LSM 

InlODDR) 

Std 

Error 

iTAB1,M16l075,CANDA 

0 

[TAB1,Miq075,BASEUNE 

ZM 

0 

ITAB1,M16l0re,CANDB 

9.26 

0 

1TAB1.M16]200,CANDA 

154 

0 

[TABI  ,M1  q20O,BASEUNE 

0.96 

0 

rTAB1,M16l200,CANDB 

0.84 

0 

(TAB1,M16]300,CANDA 

053 

0 

[TAB1  .M1  g300,BASEUNE 

0.14 

0 

[TAB1.M161300.CANDB 

-0.42 

0 

[TAB1,M4]075,CANDA 

3.48 

0 
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perfect  linear  fit  coded  into  X.  So  all  statements  concerning  statistical  significance  are  meaningless,  but  the 
estimates  LSMs  are  fine,  and  fiiey  can  be  joumelled  to  a  word  processing  file  and  then  transferred  into  a  spreadsheet 
to  give  a  table  such  as  Table  2,  where  the  LSMs  of  hi(ODDR)  can  be  translated  back  to  marginal  “LSM”  estimates 
of  Prob[Y=Hit]  using  formula  (3).  The  spreadsheet  data  can  Aen  be  used  for  tables  of  estimates  and  plots  such  as 
the  one  in  Figure  4.  Figure  4  suggests  that  alfiiough  differences  in  hit  performance  between  sights  were  generally 
not  large  as  a  function  of  range,  performance  of  CANDB  tended  to  ^-off  fester  with  range.  Since  there  is  a 
physical  explanation  for  such  an  increased  fell-off,  the  plot  proved  to  be  helpful. 


SYNTHETIC  PROBABILITIES 


Figure  4  shows  that  graphical  displays  of  effect  estimates  can  be  used  to  aid  interpretation  of  logistic  regression 
results  in  a  manner  similar  to  the  way  LSMs  can  be  used  to  interpret  ANOVA  tables  for  continuous  data.  Although 
often  helpfuL  displays  such  as  Figure  4  suffer  the  problem  common  to  all  such  displays  of  point  estimates — there  is 
no  indication  of  spread  and  sample  size.  Once  the  vector  X  of  ln(ODDR)  estimates  is  available,  formula  (3)  can  be 
used  to  for  each  of  the  row  in  the  underlying  contingency  table  to  produce  several  types  of  interesting  dot-  and  box- 
plots  v(4iich  help  alleviate  this  problem.  The  key  is  that  formula  (3)  produces  for  each  of  the  3890  rows  in  the 
underlying  contingency  table  an  estimate  PROBHIT  for  Prob[Y=Hit]  in  tiiat  row  wWch  is  based  not  only  on 
PROPHn’=HITS/PRES  for  that  row  (“PRES”  is  the  number  of  presentations — which  varied  fi'om  1  to  1 1  with  a 
mode  of  5  or  6  having  840  presentations  each)  but  also  on  many  of  the  other  3889  rows  via  the  model  of  Table  1. 
These  PROBHIT  estimates  are  analogous  to  the  “predicted  values”  of  OLS  regression.  But  for  count  data  they  can 
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be  regarded  as  “synthetic  proportions”  since  they  take 
the  very  coarse  values  for  PROPfflT  and  smooth  them 
via  a  complicated  function  (Figure  3)  into  much  more 
continuous  estimates  suitable  forgrqrhical  display.  As  a 
first  example,  the  ordinary  boxplots  of  synthetic 
proportions  in  Figure  5  produce  a  more  satisfectory 
display  than  in  the  earlier  Figure  2.  With  die  amount  of 
data  in  this  example,  however,  even  Figure  5  is  too 
dens&to  be  entirely  pleasing.  Multiway  plots  can  spread 
out  the  Hata^  alleviating  this  overly  dense  plotting,  ha 
fact,  recQit  work  at  Bell  Labs^  has  developed  interestmg 
tabular  displays  of  graphical  analyses  called  “trellis 
graphics”  which  are  implemented  in  the  newest  versions 
of  the  “S”  language.  These  displays  permit  flexible 
tahiilar  display  of  multiway  d^  airtomatically 
smoothed  or  repackaged  via  a  command  language. 
Similar  displays  can  be  produced  more  clumsily  outside 
S  by  carefiiUy  recoding  horizontal  and  vertical  plotting 
parameters  in  the  data.  In  particular,  the  display  of 
LSMs  in  Figure  4  can  be  replaced  by  the  multiway 
display  ofdotplots  in  Figure  6.  Compared  to  Figure  4, 
Figure  6  gives  a  deeper  understanding  of  what  is  going 
on  in  the  data,  and  it  raises  some  disturbing  questions 
since  it  is  clear  that  hit  performance  is  very  closely 
grouped  at  short  ranges. 
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Figure  5.  Boxplots  of  synthetic  hit  proportions 
(more  helpful  than  Figure  2). 
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Figure  6.  Synthetic  hit  proportions  by  range  and  firing  table  for  one  WEAPON. 

(Xq/t  to  right  the  jittered  point  clouds  correspond  to  CANDA,  BASELINE,  and  CANDB.) 
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RESIDUALS  AND  RESCALING 


Although  synthetic  proportions  permit  graphical  display  of  hit/miss  data  in  a  richer  manner  than  the  the  true  hit 
proportions  do,  they  rely  heavily  on  the  underlying  model  To  assess  the  model  dependence,  some  sort  of  residual 
plot  is  desirable,  and  a  natural  definition  of  residuals  for  plotting  is 

RESIDUAL  =  (HITS-PRES*PROBHIT)/SQRT(PRES*(l-PROBHIT>PROBHIT).  (4) 

where  “PRES”  is  the  number  of  presentations.  With  this  definition,  the  residual  plot  in  Figure  7  is  easy  to  obtain. 
Clearly  something  fishy  is  going  on  at  short  range  and  occasionally  at  long  range.  A  few  moments  reflection  is 
enough  to  guess  the  problem.  With  only  a  few  presentations  per  cell  and  a  very  high  probability  of  hit  at  close 
range,  one  would  expect  PROBHIT  estimates  to  be  very  near  1  at  short  range  so  that  residuals  would  be  quite  large 
(and  negative)  in  any  contingency  table  cell  without  perfect  hit  performance.  Likewise,  long  range  cases  with 
relatively  small  hit  probabilities  could  be  expected  to  have  some  large  positive  residuals.  The  additive  definition  of 
residuals  is  not  entirely  satisfectoiy  since  the  underlying  model  is  multiplicative.  However,  the  author  does  not 
know  of  a  good  way  to  formulate  multiplicative  residuals  which  accommodates  the  actual  zeros  and  ones  in  the 
observed  hit  proportion  data. 
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Figure  7.  Residuals  of  synthetic  hit  proportions  by  range  and  firing  table  for  one  WEAPON. 

(Le^  to  right  the  jittered  point  clottds  correspond  to  CANDA,  BASELINE,  and  CANDB; 
the  horizontcd  lines  represent  overall  means.) 

Plotting  the  original  odds  ratios  (ODDR)  on  a  log  scale  yields  die  annotated  plot  in  Figure  8.  Comparing 
Figure  8  with  Figure  7  confirms  that  very  higMow  probabilities  and  small  numbers  of  presentations  (Figure  8)  are 
associated  with  the  large  residuals  in  Figure  7.  This  plot  also  shows  that  there  are  two  shmt-range  cases  which  may 
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Figure  8.  Rescaled  plot  of  original  odds  ratio  estimates  on  logaridimic  scale 
(area  of  circles  proportional  to  number  of  presentations). 


have  skewed  the  results  and  been  influential  in  the  logistic  regression  fit.  One  case  involved  presentations  of  only 
two  targets  \shere  nineteen  soldiers  hit  both  while  one  soldier  hit  only  one.  The  other  involved  presentations  of  six 
targets  in  which  all  twenty  soldiers  hit  all  six.  The  author  does  not  know  exactly  how  much  these  two  ^parently 
influential  points  affected  the  overall  fit  and  significance  statements  presented  in  Table  1 . 

SUMMARY,  CONCLUSIONS  AND  DIFFICULTIES 

Clearly,  gr^hical  techniques  can  contribute  to  the  imderstanding  of  coxmt  data  analyzed  via  nominal  logistic 
regression.  Such  techniques  can  provide  substantial  insight  to  both  the  model  fit  and  the  ori^al  data  set.  The 
notion  of  synthetic  proportions  helps  a  lot  in  providing  helpful  displays  since  it  pro^des  statistics  wMch  can  be 
displayed  and  analyzed  like  continuous  data.  However,  synthetic  proportions  rely  heavily  on  the  underlymg  model, 
they  yield  no  really  good  residuals,  and  diere  is  no  clear  path  to  influence  diagnostics.  Furthermore,  the  right  plots 
are  based  on  ]n(ODDR),  not  PROBHIT  (synthetic  proportions).  Finally,  both  Ae  techniques  and  Ae  graphics  are 
borderiine  in  boA  memory  and  processing  time  for  most  PCs;  the  word  processing  file  in  which  this  p^ier  resides 
is  5.6MB  in  size,  Ae  number  of  objects  m  some  graphics  exceeded  Ae  32K  objects  limitation  m  iny  graphics 
editing  program  and  my  printer  had  to  be  tricked  mto  printing  some  pages.  But  Aese  computational  limitations  are 
quickly  dissqipearing.  Despite  Ae  diiBSculties,  Ae  bottom  line  is  that  graphical  techmques  can  be  used  effectively  to 
provide  insight  to  analysis  of  count  data  just  as  Aey  are  used  effectively  for  continuous  data. 
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ABSTRACT 

Since  the  inception  of  the  Strategy-to-Task  evaluation  framework,  originally  suggested  by  RAND’s  Lt.  Gen.  Glenn 
A.  Kent,  the  Operational  Test  and  Evaluation  community  has  been  struggling  with  how  to  implement  it.  The  top- 
down  definition  of  the  hierarchical  structure  linking  high-level  objectives  and  tasks  to  the  functional  performance 
that  a  system  must  demonstrate  has  been  successfully  accomplished.  However,  successful  implementation  of  a 
methodology  through  which  the  functional  performance  level  data  gathered  during  testing  can  flow  back  up  through 
the  hierarchy:  being  aggregated  and  synthesized  to  provide  truly  meaningful  information  to  the  decision-maker  has 
been  elusive.  This  paper  describes  an  Intelligent  Hierarchical  Decision  Architecture  that  uses  fuzzy  set  theory  as 
well  as  the  Dempster-Shafer  Theory  of  Evidential  Reasoning  to  take  functional  performance  level  data  as  input  and 
provides  a  probabilistic  bound  on  the  system  performance  at  the  operational  task  level  as  output. 

INTRODUCTION 

The  Strategy-to-Task  evaluation  framework,  originally  suggested  by  RAND’s  Lt.  Gen.  Glenn  Kent  (Kent  &  Simon, 
1991),  was  eagerly  adopted  by  the  operational  testing  community  as  a  means  to  link  low-level  functional 
performance  information  about  a  system,  gathered  during  a  testing  effort,  to  high-level  operational  tasks  and 
objectives  that  a  system  needs  to  be  able  to  accomplish.  Kent’s  hierarchical  evaluation  framework  requires  that 
high-level  objectives  be  defined,  then  underlying  objectives  and  operational  tasks  are  outlined.  Once  the  system’s 
operational  tasks  are  defined,  the  functional  performance  characteristics  that  a  system  must  be  able  to  meet  to 
accomplish  those  operational  tasks,  are  determined.  This  top-down  definition  of  objectives  to  tasks  to  functional 
performance  characteristics  has  been  accomplished  in  many  operational  testing  programs.  What  has  been  lacking  is 
a  methodology  through  which  the  functional  performance  level  data  gathered  during  the  testing  effort  can  be 
aggregated  and  synthesized  to  flow  back  up  the  strategy-to-task  hierarchy  to  the  operational  task  level,  where  it  can 
provide  meaningful  information  to  the  acquisition  decision-maker. 

Current  analysis  methods  used  by  the  Operational  Test  and  Evaluation  (OT&E)  community  are  limited  to  standard 
statistical  methods  and  a  limited  use  of  Modeling  and  Simulation  (M&S).  Although  both  have  proved  inadequate  in 
providing  information  to  the  decision-maker  at  the  operational  task  level,  they  continue  to  be  used,  and  in  fact, 
endorsed  as  the  preferred  analysis  methods.  The  currently  used  statistical  methods,  such  as,  statistical  hypothesis 
testing,  analysis  of  variance,  design  of  experiments,  and  non-parametric  statistics  offer  a  means  of  summarizing  the 
information  gathered  during  the  testing  efforts,  but  do  not  provide  a  method  for  extrapolating  the  data  to  higher 
information  levels.  Statistical  model  building  techniques,  such  as,  regression  analysis  and  time  series  analysis 
provide  a  means  to  predict  future  performance  once  a  model  is  built  of  a  process,  however,  in  most  cases  in  the 
OT&E  arena,  sufficient  data  do  not  exist  to  build  these  models.  M&S  using  the  “legacy  models’’  has  been  suggested 
as  a  means  for  answering  questions  at  higher  information  levels,  however,  the  M&S  solution  offers  its  own 
dilemmas.  For  example. 
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•  In  order  for  the  models  at  the  higher  level  (i.e.,  mission-level  or  campaign-level  models)  to  run  in  a  reasonable 
amount  of  time,  many  simplifications  were  made  in  their  development.  These  simplifications  preclude  them 
from  being  used  as  detailed  analysis  tools. 

•  Each  of  the  legacy  models  was  developed  by  a  different  organization  for  a  different  purpose.  There  was  no 
thought  given  to  an  architecture  that  would  tie  these  models  together  until  long  after  the  models  were  already 
developed. 

•  Finally,  the  issue  of  verification,  validation,  and  accreditation  (W&A)  of  these  models  is  one  that  is  just  now 
beginning  to  receive  attention.  No  systematic  mechanisms  or  databases  are  readily  available  to  allow  the  analyst 
to  determine  a  model’s  applicability  to  the  task  at  hand. 

Other  modeling  techniques,  such  as  Monte-Carlo  simulation,  can  be  employed  to  draw  conclusions  at  the 
operational  task  level  from  the  functional  performance  level  data  if  transformations  between  the  two  information 
levels  are  known  in  functional  form.  However,  in  most  cases,  these  functional  transformations  do  not  exist,  thus, 
severely  limiting  the  use  of  these  methods.  After  an  initial  analysis  of  all  of  the  statistical  and  analytical  methods 
used  in  OT&E,  the  National  Research  Council  affirmed  the  inadequacy  of  the  current  analysis  methods,  when  they 
listed  the  four  aspects  of  operational  testing  contributing  to  its  difficulty  and  complexity  (National  Research  Council, 
1995): 

•  statistical  methods  meant  for  making  one-at-a-time  pass/fail  decisions  are  inappropriate  for  OT&E  decision¬ 
making  problems 

•  OT  involves  realistic  engagements  where  factors  which  cannot  be  controlled  affect  the  testing  outcome 

•  OT  is  expensive,  thus,  frequently  the  testing  yields  sparse  data  to  support  decision-making 

•  the  incorporation  of  additional  sources  of  relevant  data  poses  methodological  and  organizational  challenges. 

So,  we  see  that  the  OT&E  community  is  faced  with  an  analysis  challenge:  how  to  provide  meaningful  information 
to  the  acquisition  decision-maker  with  currently  available  tools  that  are  inadequate  for  the  task.  The  OT&E 
community  needs  a  methodology  through  which  functional  performance  level  data  and  other  non-numerical 
information  can  be  combined  to  help  the  decision-maker  determine  a  system’s  task  accomplishment  capabilities. 
The  method  must  be  able  to  handle  small  data  sample  sizes,  uncontrollable  testing  conditions,  all  relevant 
information  regardless  of  its  form,  and  not  establish  arbitrary  pass/fail  criteria.  The  Intelligent  Hierarchical 
Decision  Architecture  has  been  developed  to  address  this  OT&E  analysis  void. 

METHODOLOGY 

This  section  describes  a  methodology  through  which  low-level  information  is  aggregated  and  synthesized  to  provide 
information  at  the  operational  task  level  using  the  Intelligent  Hierarchical  Decision  Architecture,  shown  in  Figure  1 
(Beers  1996).  The  Intelligent  Hierarchical  Decision  Architecture  is  composed  of  four  components  -  a  Clustering 
Methodology  which  takes  the  raw  test  data  and  forms  a  fuzzy  distribution,  a  Fuzzy  Associative  Memory  which 
performs  the  transformation  from  the  functional  performance  level  to  the  operational  task  level,  a  Fuzzy  Cognitive 
Map  which  adjusts  the  system  performance  measurement  indicated  by  the  testing  effort  for  factors  that  could  not  be 
controlled  or  including  in  the  testing,  and  an  Aggregation  Methodology  which  aggregates  the  system  performance 
across  the  logical  divisions  of  the  system  performance.  First,  we  begin  with  a  short  description  of  fuzzy  set  theory, 
then  describe  each  of  the  major  components  of  the  Intelligent  Hierarchical  Decision  Architecture. 

FUZZY  SET  THEORY  BASICS 


Throughout  our  formal  mathematical  education  we  are  exposed  to  set  theory.  We  learn  in  those  early  classes  that  an 
element  is  a  member  of  a  set  or  is  not  a  member  of  a  set  -  black  or  white.  Fuzzy  set  theory  was  introduced  by  Lofiti 
Zadeh  in  1965  to  handle  situations  where  an  element  can  be  a  partial  member  of  a  set  (Zadeh  1965)  (Zadeh  1973). 
The  degree  of  membership  of  an  element  within  a  fuzzy  set  is  indicated  by  its  membership  function  value,  fi,  a  value 
in  the  range  [0,1]  with  zero  indicating  no  membership  and  unity  indicating  full  membership.  The  values  between 
zero  and  one  are  used  to  indicate  partial  membership  of  the  element  within  the  set.  Consider  the  example  of  a  man 
who  is  seven  feet  tall,  clearly  a  member  of  the  set  of  tall  men,  his  membership  function  value  with  respect  to  the  set 
would  be  unity,  /Jr all  =  l-O*  On  the  other  hand,  a  man  who  is  5*7”  tall  might  be  only  a  member  of  the  set  of  tall  men 
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only  to  a  degree  0.5,  /iTALL=  0.5.  Finally,  a  man  whose  height  was  3’  clearly  should  not  be  considered  a  member  of 
the  set  of  tall  men,  thus  his  membership  function  value  would  be  zero.  The  idea  of  this  gradual  transition  from  non¬ 
membership  to  full  membership  has  found  a  use  in  many  engineering  applications,  particularly  systems  control 
applications,  where  fuzzy  set  theory  has  improved  the  performance  of  such  diverse  equipment  as  subway  trains, 
washing  machines,  and  fault  detection  systems  (McNeill  &  Frieberger  1990).  It  also  offers  a  means  for  the  testing 
community  to  consider  system  performance  evaluations  in  a  more  realistic  manner.  With  current  analysis  methods, 
the  testing  conununity  must  draw  a  line  in  the  system  performance  space  —  a  hard  and  fast  pass/fail  criterion. 
However,  the  criterion  is  seldom  that  black  and  white.  Why  should  an  electronic  combat  system  that  causes  a  missile 
to  miss  an  aircraft  by  14’ 11”  be  considered  a  failure,  while  one  that  causes  a  miss  distance  of  15’ 1”  be  considered  a 
success?  Can  we  really  justify  that  precision  in  our  evaluation  criteria,  or  would  a  gradual  transition  from  bad  to 
good  performance  be  more  realistic?  The  Intelligent  Hierarchical  Decision  Architecture  uses  fuzzy  set  theory  and 
frizzy  logic  concepts  throughout  its  processing  to  allow  a  more  realistic  and  meaningful  evaluation  of  the  operational 
testing  data.  Now  we  turn  to  a  discussion  of  the  four  stages  of  the  hierarchical  structure. 
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Figure  1  Intelligent  Hierarchical  Decision  Architecture 


STEP#1:  CLUSTERING  METHODOLOGY 

The  Clustering  Methodology  is  the  first  stage  in  the  Intelligent  Hierarchical  Decision  Architecture.  It  takes  the  raw 
test  data  and  forms  it  into  a  fuzzy  set,  which  is  called  a  Composite  Fuzzy  Membership  Function,  or  COMMFFY. 
This  COMMFFY,  formed  through  the  three  step  process  described  below,  will  be  an  optimal  description  of  the 
original  raw  test  data  in  fuzzy  set  form  at  the  Measure  of  Functional  Performance  (MOFP)  level.  That  is,  each  test 
measure  for  which  data  are  gathered  will  have  a  COMMFFY  built  that  will  be  used  for  subsequent  processing  within 
the  Intelligent  Hierarchical  Decision  Architecture. 

The  first  step  within  the  Clustering  Methodology  is  to  define  fuzzy  sets,  called  Basic  Membership  Functions, 
which  will  be  the  basis  for  constructing  the  COMMFFY.  These  fuzzy  sets  can  be  developed  in  one  of  two  ways; 
through  a  fuzzy  clustering  method  or  through  a  heuristic  approach.  The  fuzzy  clustering  method  (Gath  &  Geva 
1989)  requires  that  enough  data  describing  each  Measure  of  Functional  Performance  be  available  to  perform  a  fuzzy 
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clustering  algorithm,  which  is  frequently  not  the  case  during  operational  testing,  so  we  will  concentrate  here  on  the 
heuristic  approach.  Using  that  approach,  we  look  at  all  the  possible  values  that  a  variable  can  take  on,  or  its  universe 
of  discourse,  and  define  fuzzy  sets  within  the  universe  of  discourse  that  adequately  describe,  in  linguistic  terms,  those 
sets.  For  example,  the  triangular-shaped  Basic  Membership  Functions  shown  in  Figure  2  divide  the  universe  of 
discourse  into  five  equal  segments  with  a  50%  overlap.  The  linguistic  tags  LO,  LOMED,  MED,  ME, PHI,  and  HI 
describe  the  fuzzy  sets  and  are  used  in  subsequent  stages  to  facilitate  an  intuitive  understanding  of  the  algorithmic 
processing. 


Figure  2  Sample  Basic  Membership  Functions  and  Five  Sample  Data  Points 


Once  the  Basic  Membership  Functions  have  been  defined,  one  of  four  Compositional  Methods  is  used  to  form  the 
Composite  Fuzzy  Membership  Function,  or  COMMFFY.  The  Compositional  Methods  used  to  form  the  COMMFFY 
are  Max-Max,  Max-All,  Min-Max,  and  Min-All.  The  four  methods  differ  in  how  they  apply  the  raw  data  points  to 
the  Basic  Membership  Functions.  First,  the  inner  operation  is  accomplished:  either  Max  or  All.  Then,  once  the 
inner  operation  is  accomplished,  we  look  inside  each  Basic  Membership  Function  to  perform  the  outer  operation: 
either  Max  or  Min.  Finally,  the  COMMFFY  is  formed  by  joining  the  components  of  the  Basic  Membership 
Functions  derived  from  these  two  operations.  The  inner  operation  describes  how  each  data  point  interacts  with  each 
Basic  Membership  Function,  or  fuzzy  set.  For  example,  with  the  xxx-Max  operation,  each  data  point  activates  only 
the  fuzzy  set  where  it  is  a  maximum.  In  the  sample  shown  in  Figure  2,  consider  the  data  point  labeled  #1,  it 
intersects  both  the  MED  fuzzy  set  and  the  LOMED  fuzzy  set.  It  is  a  maximum  in  the  LOMED  set.  With  data  point 
#2,  its  maximum  activation  is  in  the  MED  set.  Once  the  inner  operation  considers  all  the  test  data  for  a  given 
measure,  we  turn  to  the  outer  operation.  In  this  case,  let’s  look  at  the  Max-xxx  operation.  Now  we  look  within  each 
fuzzy  set  and  find  the  maximum  of  all  the  activation  levels  generated  by  the  inner  operation.  So  in  this  example  the 
maximum  within  the  LOMED  set  was  the  activation  level  contributed  by  point  #1,  the  maximum  within  the  MED  set 
was  the  activation  level  contributed  by  point  #3,  and  so  on.  Once  the  inner  and  outer  operations  have  been 
completed,  the  COMMFFY  is  formed  by  taking  the  maximum  activation  level  within  any  Basic  Membership 
Function  for  each  member  of  the  universe  of  discourse.  Figure  3  shows  the  COMMFFY  resulting  from  the  Max- 
Max  compositional  method  for  the  sample  data  points  and  Basic  Membership  Functions  shown  in  Figure  2. 


Figure  3  COMMFFY  Resulting  from  Max-Max  Compositional  Method 


The  choice  of  which  Compositional  Method  to  use  for  a  given  data  set  is  determined  through  an  on-line 
optimization  using  a  fuzzy/statistical  similarity  measure  that  was  developed  to  relate  the  COMMFFY  with  a  normal 
statistical  distribution  that  would  be  generated  from  the  same  data.  With  the  COMMFFY  formed  for  each  Measure 
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of  Functional  Performance,  we  now  turn  to  the  next  stage  in  the  Intelligent  Hierarchical  Decision  Architecture,  the 
Fuzzy  Associative  Memory. 

STEP  #2;  FUZZY  ASSOCIATIVE  MEMORY 

The  second  stage  of  the  Intelligent  Hierarchical  Decision  Architecture  transforms  the  information  at  the  functional 
performance  level  to  information  at  the  operational  task  level  using  a  Fuzzy  Associative  Memory,  essentially  a  set  of 
rules  that  relate  the  performance  at  the  two  levels  in  terms  of  fuzzy  sets  (Kosko  1992).  Once  the  performance  due  to 
each  of  the  functional  performance  measmes  has  been  transformed  to  the  operational  task  level,  the  information  is 
aggregated  into  a  single  COMMFFY  at  the  operational  task  level  using  a  modification  of  the  Reduction  Theorem 
(Wang  &  Vachtsevanos  1990). 

The  rules  within  the  Fuzzy  Associative  Memory  can  initially  be  built  using  expert  judgment,  then  subsequently 
updated  as  more  information  is  gathered  on  the  system-under-test’s  performance,  through  testing  or  modeling  and 
simulation.  The  Fuzzy  Associative  Memory  takes  the  form  shown  in  Figure  4.  Each  of  the  boxes  pictured  in  Figure 
4  is  a  rule  bank  relating  the  fuzzy  sets  at  the  functional  performance  level  with  the  fuzzy  sets  at  the  operational  task 
level. 


Figure  4  Intelligent  Hierarchical  Decision  Architecture’s  Fuzzy  Associative  Memory  Structure 


The  transformation  from  the  functional  performance  level  to  the  operational  task  level  is  accomplished  using  the 
Fuzzy  Associative  Memory  as  described  above,  yielding  at  the  output  of  this  second  stage,  a  COMMFFY  indicating 
the  system’s  performance  at  the  operational  task-accomplishment  level. 

STEP  #3;  FTIZZY  COGNITIVE  MAP 

Frequently,  during  the  performance  of  an  operational  test  there  are  factors  that  cannot  be  included  or  controlled 
during  the  testing  effort,  yet  are  known  to  have  an  affect  on  the  outcome  of  the  system  performance  measure.  To 
adjust  the  testing-derived  system  performance  measurement  for  factors  that  could  not  be  included  or  controlled  in  the 
testing  effort,  we  use  a  Fuzzy  Cognitive  Map. 

A  Fuzzy  Cognitive  Map  is  a  figure  indicating  cause  and  effect  relationships  between  factors,  developed  originally 
by  Bart  Kosko  based  upon  the  work  done  by  Robert  Axelrod  (Axelrod  1976).  Using  the  map,  Kosko  demonstrated 
that  “what-if’  questions  could  be  answered  by  performing  a  series  of  matrix  multiplication  and  thresholding 
operations  on  the  matrix  derived  from  the  map  (Kosko  1986).  The  Intelligent  Hierarchical  Decision  Architecture 
uses  Kosko’ s  work  as  a  foundation,  and  uses  the  map  to  adjust  the  system  performance  indicated  by  the  test 
measurements  for  factors  that  could  not  be  controlled  or  included  during  the  testing  effort.  This  adjustment  is 
accomplished  using  the  following  steps: 

•  Define  a  Fuzzy  Cognitive  Map  that  relates  the  untestable  and  uncontrollable  factors  to  the  Measure  of  Task 
Accomplishment  (MOTA)  used  during  the  testing  effort.  Figure  5  illustrates  an  FCM  relating  factors  that  could 
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affect  the  outcome  during  the  testing  of  an  electronic  combat  system.  The  linguistic  tags  define  the  degree  and 
direction  of  the  effects.  For  example,  if  the  Number  of  Threats  in  the  Scenario  cannot  be  adequately  represented 
(i.e.,  fewer  threats  on  the  test  range  than  would  be  encountered  in  a  wartime  environment),  the  resulting 
Reduction  in  that  would  be  measured  during  the  test  would  be  some  amount  better  than  it  would  have  been, 
therefore,  a  -some  adjustment  should  be  made  to  the  measured  performance. 


Figure  5  Sample  Fuzzy  Cognitive  Map  for  an  Electronic  Combat  System 


•  Looking  at  each  concept  within  the  map,  define  the  possible  paths  fi-om  that  concept  to  the  task-accomplishment 
measure  concept.  For  example,  on  the  map  shown  in  Figure  5,  starting  at  Number  of  Threats  in  Scenario  we  can 
define  a  path  directly  to  Reduction  in  and  a  path  that  goes  through  Target/Threat  Relative  Distance  then  to 
Reduction  in  P^,  etc.  This  path  definition  step  can  be  simplified  by  using  the  matrix/vector  multiplication  to 
determine  the  limit  cycles  as  described  in  (Kosko  1986).  The  activated  concepts  are  those  that  need  to  be 
considered  in  the  path  definition  process. 

•  Once  all  the  possible  paths  fi-om  each  concept  to  the  central  concept  have  been  defined,  find  the  minimum  value 
of  the  linguistic  tags  associated  with  each  path  (this  requires  an  importance  ordering  of  the  tags  used  to  define 
the  links,  e.g.,  little  <  some  <  much  <  very)  ignoring  the  signs  of  the  tags.  Once  the  minimum  value  of  each  of 
the  possible  paths  is  defined,  take  the  maximum  value  of  the  tags  associated  with  each  concept  across  all 
possible  paths. 

•  Finally,  rank  order  the  most-negative  to  most-positive  linguistic  tags  associated  with  all  the  concepts  in  the  map. 
The  most-negative  tag  will  be  used  to  adjust  the  task-level  COMMFFY  to  indicate  the  worst-case  system 
performance  and  the  most-positive  tag  will  be  used  to  adjust  the  task-level  COMMFFY  to  indicate  the  best-case 
system  performance  of  the  system. 

•  The  adjustment  is  carried  out  using  the  following  adjustment  formulae: 

Best  Case 
Adjustment 

..  =  //  *  Worst-Case 

h^NegAdj  f^oid  Adjustment 

where  the  value  of  k  is  chosen  to  provide  an  adequate  adjustment  to  the  fuzzy  distribution  values  and  fx  represents  the 
fuzzy  membership  function  values. 


PosAdj  f^old 


l/k 
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STEP  #4:  AGGREGATION  METHODOLOGY 


The  first  three  stages  of  the  Intelligent  Hierarchical  Decision  Architecture  are  carried  out  for  each  logical  division 
of  the  system-under-test’s  performance.  For  example,  for  the  testing  of  an  electronic  combat  system,  the  first  three 
stages  would  be  carried  out  for  each  threat  system  that  the  electronic  combat  system  is  tested  against.  The  final  stage 
aggregates  the  system  performance  across  the  logical  divisions,  providing  the  final  result,  a  probabilistic  bound  on 
the  system  performance  at  the  operational  task  level. 

The  aggregation  is  carried  out  using  Dempster’s  Rule  of  Combination  taken  from  the  Dempster-Shafer  Theory  of 
Evidential  Reasoning  (Shafer  1976).  Using  this  method,  each  of  the  adjusted,  task-level  COMMFFYs  for  the  best- 
case  system  performance  are  combined  to  form  a  best-case  probabilistic  bound;  and  each  of  the  worst-case 
COMMFFYs  are  combined  to  form  a  worst-case  probabilistic  bound.  These  two  probabilistic  bounds,  along  with  a 
measure  of  the  Degree  of  Certainty  associated  with  each  possible  hypothesis,  are  provided  to  the  decision-maker  as 
the  outcome  of  the  operational  testing  effort.  The  basic  steps  of  the  aggregation  method  are  as  follows. 

•  The  maximum  degree  of  membership  within  each  of  the  original  Basic  Membership  Functions  is  defined  from 
the  COMMFFYs.  For  each  logical  division  of  the  system-under-test’s  performance,  possible  hypotheses  sets  are 
defined  by  taking  alpha-level  cuts  of  the  fuzzy  set  defined  from  the  Basic  Membership  Function  values. 
Subtracting,  subsequent  values  of  the  alpha-level  cuts  gives  the  Dempster-Shafer  basic  probability  assignment 
value  for  each  hypothesis  (Yen  1990).  The  basic  probability  assignment  is  the  amount  of  evidence  that  is 
pointing  to  that  hypothesis  being  true. 

•  The  evidence  associated  with  each  logical  division  of  the  system-under-test  performance  is  then  combined  two- 
by-two  with  other  division’s  evidence  using  the  intersection  tableau  method  (Gordon  &  Shortliffe  1985),  which 
provides  a  logical  application  of  Dempster’s  Rule  of  Combination.  Given  two  pieces  of  evidence  that  provide 
information  on  the  hypothesis  'F  denoted  OT|CF)  and  miQ¥),  the  combined  basic  probability  assignment  is 
denoted  mi2('P)  and  is  given  by: 

where  is  a  normalization  factor  and  is  given  by: 

AnB=0 


•  The  belief  function,  defined  as  the  lower  probability  bound  on  a  hypothesis,  and  the  plausibility  function, 
defined  as  the  upper  probability  bound  on  the  hypothesis  are  calculated  using  the  formulae  shown  below,  where 
m(A)  is  the  basic  probability  assignment  value  associated  with  hypothesis  A  (deKorvin  &  Shipley  1993). 


Bel(B)='^m(A) 

AcB 

Pl{B)=  ^m(A)  =  l-  BeliB) 

Ar\B*0 


•  Finally,  the  Degree  of  Certainty,  is  a  value  in  the  range  [-1,+1]  that  indicates  the  amount  of  evidence  pointing  to 
the  hypothesis  as  opposed  to  the  amount  of  evidence  pointing  to  contradicting  hypotheses  (Kim  1992).  A  value 
of  -Kl  for  the  degree  of  certainty  indicates  that  all  the  evidence  is  pointing  to  the  hypothesis  and  none  is  pointing 
to  contradicting  hypotheses,  a  value  of  -1  indicates  that  all  the  evidence  is  pointing  to  the  contradicting 
hypotheses,  and  a  value  of  zero  indicates  total  ignorance,  in  that  equal  amounts  of  evidence  are  pointing  to  the 
hypothesis  and  the  contradicting  hypotheses.  The  degree  of  certainty  is  calculated  as 
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DOC{X)  =  m{X)-Bel{X) 


With  the  belief  and  plausibility  functions  and  the  degree  of  certainty  calculated,  the  decision-maker  is  provided  with 
the  final  probabilistic  bound  of  the  system  performance  at  the  task-accomplishment  level.  The  final  result  is  given  in 
the  form: 


The  best-case  system  performance  is  linguistic  tag  ( where  the  tag  is  associated  with  the  basic  membership 
function(s)  representing  the  most  likely  hypothesis)  with  probability  range  [0.xxxx,0.yyyy] (where  O.xxxx  is 
the  belief  function  value  and  O.yyyy  is  the  plausibility  function  value).  The  degree  of  certainty  associated 
with  this  statement  is  zz^o  (where  zz  is  the  degree  of  certainty). 

EXAMPLE 

To  illustrate  the  methodology  described  in  the  previous  section,  a  brief  example  will  be  given  here.  Consider  an 
aircraft-mounted  jammer  system,  that  when  tested,  has  six  Measures  of  Functional  Performance  (MOFPs)  and  is 
tested  against  four  separate  threat  systems.  The  decision-maker  is  interested  in  determining  the  system’s  ability  to 
reduce  the  probability  of  kill  of  the  aircraft  carrying  the  jammer.  The  evaluation  framework  for  the  system,  called 
Jammer-X,  is  shown  in  Figure  6. 


During  the  OT&E,  data  are  gathered  on  each  of  the  functional  performance  measures,  and  current  analysis  methods 
provide  the  decision-maker  with  24  pass/fail  results  at  that  level  requiring  that  he  draw  high-level  conclusions  from 
this  low-level  information.  The  Intelligent  Hierarchical  Decision  Architecture  will  be  used  to  form  a  probabilistic 
bound  on  the  system’s  Reduction  in  capabilities,  based  upon  the  measurements  taken  on  the  aspects  of  the 
system’s  technical  performance  shown  in  Figure  6.  The  first  step  is  to  define  the  Basic  Membership  Functions  and 
apply  the  test  data  to  them  to  form  a  Composite  Fuzzy  Membership  Function,  or  COMMFFY,  for  each  MOFP/threat 
combination.  The  Basic  Membership  Functions  chosen  for  this  example  are  triangular-shaped,  with  a  50%  overlap, 
as  shown  in  Figure  7. 
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Using  the  basic  membership  functions  shown  in  Figure  7  as  the  foundation,  and  applying  the  data  given  as  m 
example  of  the  test  measurements  taken  on  one  of  the  MOFPs,  shown  in  Table  1,  the  COMMFFYs  illustrated  in 
Figure  8  result. 


Percent  Increase  in  Break  Locks 

Run  Number 

Threat  A 

Threat  B 

Threat  C 

Threat  D 

1 

76.65 

57.17 

43.64 

23.78 

2 

89.77 

53.44 

47.77 

35.93 

3 

90.92 

55.46 

35.93 

16.65 

4 

90.47 

62.48 

48.65 

48.65 

5 

98.66 

62.95 

31.90 

52.97 

6 

94.78 

51.33 

49.40 

68.85 

7 

91.38 

59.11 

38.41 

59.11 

8 

94.10 

53.73 

46.44 

46.15 

9 

76.15 

52.97 

30.63 

77.02 

10 

77.02 

51.63 

35.10 

76.15 

Table  1  Raw  Test  Data  Collected  for  MOFP  #3,  Percent  Increase  in  Break  Locks 
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MOFP  #3  /  Threat  B  COMMFFY 
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e  1.00 

S 
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Figure  8  Functional  Performance  Level  COMMFFYs  for  MOFP  #3 


The  COMMFFYs  illustrated  in  Figure  8  are  four  of  the  24  that  would  be  formed  in  the  first  stage  of  the  hierarchy’s 
processing  for  this  example.  Once  all  the  functional  performance  level  COMMFFYs  have  been  formed,  the 

Fuzzy  Associative  Memory  is  used  to  transform  these  fiizzy  distributions  to  COMMFFYs  at  the  task  accomplishment 
level,  as  described  in  Step  #2  of  the  methodology  section.  Each  COMMFFY  formed  at  the  Measure  of  Task 
Accomplishment  (MOTA)  level  results  from  the  aggregation  of  the  six  functional  performance  level  COMMFFYs 
for  that  threat.  The  resulting  MOTA-level  COMMFFYs  for  this  example  are  shown  in  Figure  9. 


Ibreat  A  MOTA-level  COMMFFY 


percent  Reduction  in  Probability  of  Kill 


Threat  C  MOTA-level  COMMFFY 


Threat  B  MOTA-level  COMMFFY 


Percent  Reduction  in  Probability  of  Kill 


Threat  D  MOTA-level  COMMFFY 
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Figure  9  Task  Accomplishment  Level  COMMFFYs  for  Jammer-X 


The  COMMFFYs  shown  in  Figure  9  represent  the  task-level  system  performance  demonstrated  during  the  testing 
effort.  In  most  cases,  the  testing  effort  cannot  include  or  control  all  the  factors  known  to  affect  system  performance. 
Therefore,  in  the  third  stage  of  the  Intelligent  Hierarchical  Decision  Architecture  these  COMMFFYs  are  adjusted  for 
the  effects  of  those  factors,  as  described  in  Step  #3  of  the  methodology  section.  Using  the  Fuzzy  Cognitive  Map 
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shown  in  Figure  5,  the  best-case  adjustment  is  +ve/y  and  the  worst-case  adjustment  is  -very.  If  an  adjustment  factor 
of  2,0  is  used  in  association  with  the  linguistic  tag  very^  then  the  COMMFFYs  resulting  from  this  adjustment  for  the 
Threat  B  performance,  look  like  those  shown  in  Figure  10. 


Best-Case  Adjusted  Threat  B  COMMFFY 
(+Very) 


Worst-Case  Adjusted  Threat  B 
COMMFFY  (-Veiy) 


Figure  10  Adjusted  Task  Accomplishment  Level  Performance  Against  Threat  B 


Finally,  in  order  to  provide  a  single,  probabilistic  system  performance  bound  to  the  decision-maker,  the  Dempster- 
Shafer  theory  is  used,  as  described  in  Step  #4  of  the  methodology  section.  The  information  provided  to  the  decision¬ 
maker  would  be  as  shown  below. 

The  Jammer-X’s  Best-Case  Performance  is  LoMedHi  (the  Basic  Membership  Function  centered  at  60%  Reduction 
in  Pk)  with  probability  range  [0.9621,  0.9925],  The  degree  of  certainty  associated  with  this  statement  is  92.42%. 

The  Jammer-X’s  Worst-Case  Performance  is  LoMedHi  (the  Basic  Membership  Function  centered  at  60%  Reduction 
in  Pk)  with  probability  range  [0.7443,  0.8545].  The  degree  of  certainty  associated  with  this  statement  is  48.86%. 


CONCLUSION 

Since  the  inception  of  the  Strategy-to-Task  evaluation  framework,  the  operational  test  community  has  struggled  wth 
a  way  of  taking  the  low-level  test  data  that  is  generated  during  testing  events  or  through  modeling  and  simulation, 
and  use  it  to  provide  information  to  the  acquisition  decision-maker  that  is  meaningful  to  the  decisions  being  made. 

Current  analysis  methods  used  by  the  community  are  limited  to  standard  statistical  methods  which  provide  a  means 
for  summarizing  the  information,  but  do  not  readily  provide  a  means  for  extrapolating  the  gathered  information  to 
higher  information  levels  where  it  is  meaningful  for  the  decision  being  made.  Modeling  and  simulation  efforts,  such 
as  Monte-Carlo  simulation,  could  be  used,  but  do  not  allow  for  the  consideration  of  qualitative  information  or  allow 
a  realistic  approach  to  the  analysis  that  includes  gradual  transitions  from  good-to-bad  system  performance.  The 
Intelligent  Hierarchical  Decision  Architecture  described  here  can  be  used  to  take  the  low-level  functional 
performance  data  generated  during  the  testing  effort  and  synthesize  and  aggregate  it  into  a  probabilistic  system 
bound  at  the  operational  task  level.  In  addition  to  simply  considering  the  information  gathered  during  the  testing,  it 
allows  a  method  through  which  non-testable  or  non-controllable  factors  can  be  considered.  It  allows  the 
consideration  of  qualitative  as  well  as  quantitative  information  and  is  not  constrained  by  sample  size  requirements,  as 
are  current  statistical  methods.  The  methodology  allows  smooth  transitions  from  good-to-bad  system  performance, 
yet  yields  a  definitive  statement,  in  probabilistic  terms,  on  the  system’s  capabilities,  as  a  final  ouq)ut.  With  this 
methodology,  the  operational  test  community  can  more  adequately  provide  the  information  that  the  acquisition 
decision-maJdng  community  expects. 
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ABSTRACT 

Amy  decision  makers  are  forced  to  rely  heavUy  on  the  results  of  simulations  when  making  programmatic 
decisions  about  developmental  systems.  Quantifying  a  measure  of  assurance  associated  with  achieving  a  specified 
level  of  Pk  has  been  an  ongoing  problem  in  the  Amy  community. 

The  U  S  Amy  Materiel  Systems  Analysis  Activity  (AMSAA)  has  developed  methodology  based  on  Bayesian 
analysis  that  quantifies  the  probability  of  belief  associated  with  the  Pk  output  fi-om  a  simulation  model.  The 
approach  is  to  quantify  the  distribution  of  uncertainty  in  the  input  parameters  to  the  simulation  model  bas^  on 
available  test  data.  This  uncertainty  is  used  to  generate  a  distribution  of  belief  regarding  the  output  Pk  of  the 
simulation  model.  The  generated  Pk  belief  distribution  gives  the  Bayesian  probability  that  the  specified  level  of  Pk 
has  been  achieved.  This  paper  describes  the  methodology  developed  by  AMSAA  and  discusses  an  example  that 
demonstrates  the  applicability  of  the  methodology. 

INTRODUCTION 

When  Amy  missile  system  development  programs  reach  a  milestone  decision  point,  estimates  of  true  system 
performance  are  compared  to  required  performance  (expressed  by  probability  of  kill  (Pk))  as  a  means  of 
ftftpm^ining  vvhether  or  not  the  program  should  continue.  Because  the  true  system  performance  is  unknown, 
equally  important  as  the  estimate  of  true  performance  is  a  measure  of  assurance  that  the  true  performance  exceeds 
the  requirement  When  the  data  used  to  develop  the  estimates  of  system  perfomance  come  from  system  level 
testing  of  actual  hardware,  that  assurance  is  often  expressed  in  terns  of  confidence  bounds  using  classical 
statistical  analysis  techniques.  The  statistical  confidence  is  dependent  on  the  number  of  ^stem  flight  tests.  In 
today’s  environment,  funding  available  for  testing  Army  missile  systems  is  decreasing.  As  these  systems  become 
more  complex,  testing  also  becomes  more  complicated  and  expensive.  Many  developmental  programs  are 
conducting  more  component  and  subsystem  testing  to  quantify  the  performance  parameters  that  define  overall 
system  performance  and  then  executing  complex  simulations  to  relate  these  performance  parameters  to  Pk.  There 
are  fewer  tests  of  the  entire  system  where  system  Pk  can  be  measured  directly.  This  presents  the  challenge  of 
Hpt..m,ining  a  Suitable,  quantifiable,  measure  of  assurance  that  true  system  performance  exceeds  the  requirement 
when  data  come  from  simulations  and  a  wide  variety  of  test  sources. 

In  May  1996,  the  U.S.  Army  Materiel  Systems  Analysis  Activity  (AMSAA)  proposed  methodolo©'  to  the  office 
of  the  Deputy  Undersecretary  of  the  Army  for  Operations  Research  (DUSA-OR)  which  provides  a  quantifiable 
measure  of  assurance  that  true  system  performance  exceeds  a  stated  goal  by  using  Bayesian  techniques.  The 
DUSA-OR  office  asked  that  an  example  using  the  proposed  methodology  be  conducted  to  evaluate  its  merits.  The 
proposed  methodology  and  the  results  of  the  example  conducted  for  the  DUSA-OR  follow  in  this  report. 
Throughout  this  report,  the  measure  of  system  performance  that  will  be  discussed  is  Pk. 


Approved  for  public  release;  distribution  is  unlimited 
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METHODOLOGY 


Before  discussing  the  methodology,  it  is  important  to  imderstand  the  basic  approach  being  advocated. 
Essentially,  when  decision  makers  want  to  know  what  assurance  the  system  developers  have  in  their  performance 
estimates  (expressed  in  terms  of  Pk),  they  want  to  know  what  assurance  there  is  that  the  true,  but  unknown,  Pk 
exceeds  some  goal  (typically  an  operational  requirement).  The  approach  proposed  in  this  report  is  to  quantify  that 
assurance  through  the  Bayesian  measure  of  belief  probability.  This  concept  is  illustrated  in  Figure  1. 


Figure  1.  Bayesian  Distribution  of  Belief  Curves. 

Overall  system  performance,  as  measured  by  Pk,  is  determined  by  the  outcome  of  a  series  of  more  elementary 
processes.  These  elementaiy  processes  are  governed  by  performance  parameters  that  define  the  missile  system  at 
the  component  level.  As  an  example,  one  of  the  elementaiy  processes  a  missile  system  must  execute  is  target  track. 
One  of  the  performance  parameters  that  govern  target  track  is  the  angular  measurement  accuracy  of  the  missile 
seeker.  When  the  relationship  between  the  performance  parameters  that  define  a  missile  ^stem  and  the  resultant 
Pk  cannot  be  expressed  analytically,  one  can  use  a  simulation.  Many  of  the  performance  parameters  being 
simulated  are  stochastic  and  are  represented  by  their  distribution  parameters  in  the  simulation.  By  randomly 
selecting  jfrom  the  distributions  defining  the  performance  parameter  inputs  and  running  the  simulation  in  a  Monte 

Carlo  feshion,  an  estimate  of  system  performance  (Pk)  is  generated.  Although  typically  expressed  as  a  point 
estimate,  in  truth  there  is  some  distribution  of  belief  regarding  Pk  (shown  in  Figure  1(a))  that  defines  the  entire 
realm  of  possibilities  for  Pk.  This  distribution  of  belief  is  the  result  of  the  uncertainty  one  has  regarding  the 
distribution  parameters  that  define  the  performance  parameter  inputs  (assuming  a  correct  simulation  model). 

Because  there  is  uncertainty,  the  distribution  parameters  can  have  a  range  of  values.  A  different  P  k  is  generated 
each  time  different  values  for  the  distribution  parameters  are  used. 

Once  the  distribution  of  belief  is  quantified,  the  belief  probability  that  Pk  lies  in  any  given  interval  can  be 
determined.  If  one  selects  a  goal  value  for  Pk,  then  the  area  under  the  distribution  of  belief  curve  that  lies  to  the 
right  of  the  goal  (assuming  a  scale  that  increases  from  left  to  right)  is  the  belief  probability  that  the  true,  but 
unknown,  missile  Pk  exceeds  the  goal.  As  more  knowledge  is  gained  regarding  the  system,  the  uncertainty  in  the 
distribution  parameters  decreases  which  results  in  a  narrower  distribution  of  belief  regarding  Pk  (Figures  1(b)  and 
1(c)).  If  the  true  Pk  exceeds  the  goal  chosen  (Figure  1(b)),  one  would  expect  the  distribution  of  belief  to  shift  to  the 
right.  The  result  would  be  that  a  larger  percentage  of  the  curve  will  exceed  the  goal,  and  the  belief  probability  that 
the  true  Pk  exceeds  the  goal  will  increase.  If  the  true  Pk  does  not  exceed  the  goal  (Figure  1(c)),  the  distribution  wiU 
shift  to  the  left,  a  smaller  percentage  of  the  curve  will  exceed  the  goal,  and  the  belief  probability  that  the  true  Pk 
exceeds  the  goal  will  decrease.  Even  if  the  distribution  of  belief  regarding  Pk  does  not  shift  left  or  right,  the  belief 
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probability  that  the  true  system  performance  exceeds  the  goal  will  change  as  more  knowledge  is  gained  because  the 
shape  of  the  distribution  will  change. 

In  the  current  process  for  generating  performance  data  using  Pic  simulations,  the  uncertainly  in  the  distribution 
parameters  that  define  the  distributions  of  the  performance  parameters  is  not  captured.  For  each  engagement  point 
in  space  and  for  each  target  of  interest,  the  simulation  is  executed  a  fixed  number  of  times  using  the  nominal 
distribution  parameters  associated  with  each  of  the  performance  parameters.  Each  simulation  replication  residts  in 
either  a  kill  or  no  kill.  Dividing  the  total  number  of  kills  by  the  total  number  of  replications  yields  a  point  estimate 

of  system  performance  ( P  k)  for  a  given  engagement  point  and  threat.  The  methodology  proposed  by  AMSAA 
captures  the  uncertainty  in  the  distribution  parameters  of  the  critical  performance  parameters  in  the  Pk  simulation 
thereby  generating  a  distribution  of  belief  regarding  missile  Pk. 

Stated  generally,  the  methodology  is  to  first  identify  the  set  of  critical  performance  parameter  inputs  (Xi,  X2, ..., 
JQc)  that  have  a  significant  impact  on  the  simulation  output.  As  an  illustration,  assume  the  X,  are  independent 
normally  distributed  random  variables  with  means  0;  and  standard  deviations  ctj  (i— 1,  k).  The  next  step  is  to 

characterize  the  distribution  of  uncertainty  in  either  0;,  or  both.  As  an  example,  assume  each  0;  is  known  with 
certainty,  but  each  a-,  has  uncertainty  about  it  that  is  represented  by  some  probability  density  function,  Q(ai).  Each 
distribution  of  uncertainty  is  determined  by  the  body  of  knowledge  (test  data,  physics,  enpneering  judgment, 
requirements,  etc.)  about  that  particular  performance  parameter  input  (Xj)  at  a  given  point  in  time  and  is  called  the 
prior  distribution  of  uncertainty  for  ctj. 

Once  the  distributions  of  uncertainty  are  characterized,  they  can  be  introduced  into  the  Pk  simulation.  The 
process  for  introducing  the  uncertainty  in  the  cj;  is  to  randomly  select  a  value  for  each  O;  (where  i  takes  values  fi'om 
1  to  k)  from  each  of  the  distributions  of  uncertainly  (i.e.,  each  Oi(ai)).  This  determines  the  distribution  parameters 
defining  each  Xi  in  the  simulation.  The  next  step  is  to  run  sufficient  replications  of  the  simulation  where  each 
replication  uses  the  set  of  values  for  the  random  variables  Xi  drawn  from  the  their  respective  distributions  to 

generate  Pk  ,  an  estimate  of  Pk  (where  Pk  equals  the  number  of  kills  achieved  divided  by  the  total  number  of 
replications  as  discussed  above).  Note  that  Pk  is  generated  using  fixed  0i  and  CT;  (i=l,...,k).  Next,  draw  new 
random  values  for  the  a;  (i=l,...,k)  from  the  distributions  of  uncertainty  and  repeat  the  process.  The  result  will  be 
a  different  P  k.  Repeat  the  entire  process  a  sufficient  number  of  times  to  generate  a  histogram  of  P  k  outcomes. 
This  histogram  is  termed  the  estimated  prior  distribution  of  belief  for  Pk.  If  a  goal  value  for  Pk  is  selected,  the 
percent  of  area  to  the  right  of  the  goal  is  the  estimated  belief  probability  that  the  true  Pk  exceeds  the  goal.  As  the 
system  under  development  progresses,  more  data  will  be  gathered  for  each  of  the  Xj  performance  parameter  inputs. 
This  new  body  of  knowledge  is  used  to  update  the  distributions  of  uncertainty  of  each  ctj.  The  updated  distribution 
of  uncertainty  is  termed  the  posterior  distribution.  The  entire  process  is  repeated  using  the  current  posterior 
distribution  of  uncertainty  for  each  O;  to  develop  the  corresponding  posterior  distribution  of  belief  regarding  Pk. 

There  is  one  important  assumption  that  must  be  satisfied  when  selecting  the  critical  performance  parameter 
inputs  for  this  analysis.  Because  the  analysis  is  based  on  reducing  the  uncertainty  about  the  true,  but  unknown, 
distribution  parameters  0;  and  it  is  imperative  that  the  true,  but  unknown,  0;  and  C;  do  not  change.  The 
implication  is  that  the  Xj  selected  for  this  analysis  must  have  true,  but  unknown,  0;  and  ct;  that  do  not  vary  from 
test-to-test,  do  not  vary  throughout  a  given  test  event,  are  not  affected  by  changes  made  to  the  Pk  simulation,  and  do 
not  change  as  the  system  matures.  The  limitation  of  this  assumption  is  that  some  of  the  critical  performance 
parameters  may  not  be  included  in  the  analysis.  Additionally,  it  is  desirable  that  the  Xj  performance  parameter 
inputs  be  independent.  If  there  is  dependence,  it  must  be  explicitly  accounted  for. 

In  addition  to  providing  a  quantifiable  measure  of  assurance  that  the  true  Pk  exceeds  some  goal,  there  is  another 
powerful  benefit  of  the  approach.  The  distribution  of  belief  regarding  Pk  is  determined  by  the  uncertainty  one  has 
regarding  the  distribution  parameters  that  define  the  performance  parameter  distributional  inputs.  This  uncertainty 
is  updated  as  new  information  is  gathered  on  the  performance  parameters  through  testing.  One  can  therefore 
optimize  a  test  strategy  that  focuses  on  reducing  the  uncertainty  in  those  distribution  parameters  that  have  the 
greatest  contribution  to  the  difiuseness  of  the  distribution  of  belief  regarding  Pk. 
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THE  EXAMPLE 


To  exercise  the  methodology,  a  Pk  simulation  existing  at  AMSAA  was  used.  To  simplify  the  process  for  the 
example  requested  by  the  DUSA-OR,  the  uncertainty  in  the  distribution  parameters  of  only  one  performance 
parameter  input  was  considered.  The  variable  selected  for  this  example  is  normally  distributed.  Based  on  the 
present  body  of  knowledge,  the  mean  (0)  is  known  to  be  0.0.  The  xmcertainty  lies  with  the  standard  deviation  (a). 

If  no  data  exist  to  form  a  distribution  of  uncertainty  regarding  the  standard  deviation,  one  can  assume  a 
noninformative  prior  distribution  of  uncertainty.  The  distribution,  0(a),  used  for  the  standard  deviation  assuming 
a  noninformative  prior  is  1/a  for  a>  0.  Although  the  noninformative  prior  distribution  of  uncertainty  is  a  starting 
point,  it  is  not  a  proper  density  function  and  cannot  be  used  to  create  a  distribution  of  belief  for  Pk.  This 
distribution  of  imcertainty  must  first  be  updated  with  data.  The  data  in  Table  1  were  used  to  accomplish  this 
update.  The  analyses  were  conducted  using  only  the  first  two  data  points  to  update  the  distribution  of  uncertainty 
regarding  the  standard  deviation  and  then  repeated  using  all  nineteen  data  points.  The  purpose  for  doing  it  twice 
is  to  show  how  the  distribution  of  uncertainty  for  the  standard  deviation,  the  distribution  of  belief  regarding  Pk,  and 
the  belief  probability  that  the  true  Pk  exceeds  some  goal  change  as  more  information  is  obtained. 

Table  1.  Data  Used  to  Update  Distribution  of  Uncertainty  in  the  Standard  Deviation 


i 

Xi 

i 

Xi 

1 

0.417 

11 

0.365 

2 

-0.417 

12 

-0.340 

3 

0.355 

13 

0.365 

4 

-0.355 

14 

-0.360 

5 

0.350 

15 

0.360 

6 

-0.335 

16 

-0.365 

7 

0.340 

17 

0.365 

g 

-0.350 

18 

-0.365 

9 

0.355 

19 

0.370 

10 

-0.350 

The  data  (xi,  X2,...,  X19)  in  Table  1  denoted  by  the  vector,  2^  were  used  to  update  n(a)  and  generate  a 
distribution  of  uncertainty  for  the  standard  deviation  given  the  data.  The  updated  distribution  of  imcertainty  for  a, 
given  6=0,  and  x  (denoted  by  n(a/0=O,2O)  takes  the  form; 


Q(a  /0  =  0,20  =  K  •L(2c;0  ==  0,a)#Q(a)  where: 


(3.1) 


00 

K  =  normalizing  constant  to  ensure  J*  q(^  /  ^  =  0  x)do’  =  1 

0 


L(2c;0  =0,(7)  =  likelihood  function  associated  with  >c,  from  the  normal  distribution  with  mean  0  and 
standard  deviation,  a.  By  definition, 


L(x.;0  =0,cr)  =  J][  =0,c) 

i=l 


where: 


(3.2) 


n 

]q  =  product  from  i=l  to  i=n; 


i=I 

T|(Xi;6  =0,a)  =  probability  density  function  for  Xi  given  9=0  and  a 
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For  a  normally  distributed  random  variable. 

/  .  «  A_N_  1  2\  £7  J - L 


T|(xi;0  =0,a)  —  r- — *6 


_i— .e 


since  0=0. 


Substituting  into  equation  (3.2)  3nelds, 

i=.  <TV2;r  o-"(2;r)2 

Substituting  into  equation  (3.1)  yields. 


n(cT  /9  =  0,2L)  = - 

CT'"'’(2;r)’ 

By  definition,  the  integral  of  the  probability  density  function,  a(a/e=0,20,  from  zero  to  infinity  is  one.  Utilizing 
this  definition  and  solving  for  K  yields, 


(2;r)2  0 


Equation  (3.4)  can  be  solved  explicitly  by  making  the  following  substitution. 


Let^^^-2^^.2  then 


then  ^  _  _2^-3^  ^.2^^ 
i=l  i=l 


Note  y  as  o-  ->  0  and 
>>  -»  0  as  cr  00 


Substituting  into  equation  (3.4)  yields. 


n  /  fr  12  0 


7  ii-i 

— Jy2  e  ^dy  = 


1 

r 

'^n'' 

<2> 

n 

_ f 

2(2;r)2 1 

[p^ 

n 

r  ^ 

[- 

,  n  J 
|2^  “ 

'c  Vy 


The  probability  density  function  for  the  chi-square  distribution  with  n  degrees  of  freedom  is. 


/(x)  = 


-y  ^  ^ 
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Since  the  integral  of  the  gamma  density  function  from  zero  to  infinity  is  one,  substituting  into  equation  (3.5)  yields. 


rf-jz"'' 

KlJ 


2  ?r^Xi' 


Substituting  the  expression  (3.6)  back  into  equation  (3.3)  yields. 


Q(a /0  =  0,>O  = 


2 


Q(cr =  0,x)  =  A  •  c  2a^ '  where, 


22''r  - 

V2 


The  posterior  distribution  of  uncertainty  for  s  is  now  defined.  The  next  step  in  the  process  is  to  choose 
random  samples  of  the  standard  deviation  from  this  distribution  of  uncertainty  and  use  them  in  the  Pk  simulation  to 
generate  the  distribution  of  belief  regarding  Pk.  To  sample  the  standard  deviation,  one  needs  to  randomly  select  a 
value,  u,  from  a  uniformly  distributed  random  variable,  U[0,1],  and  solve  the  following  expression  for  gq: 


0 

u  =  Prob(cr  <  (tq)  =  Jc2(ry  /^  =  O,>0rf<y 


Note  that  by  substituting  in  equation  (3.7), 

Prob(o-  <  cr„)  =  J  A  e 


Equation  (3.8)  can  be  solved  explicitly  by  making  the  following  substitution: 


y  =  dy  - -2co  ^ 


y  ^  oo  as  6)  0,  and 


y  V  - L  as  6)  CTc 

^  Vcr  J 
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Substituting  into  equation  (3.8)  yields. 


Prob(cr  <  cTg)  = 


where  G  j  (  y  ^ )  =  1  -  G  j  ^ ,  and  G  j  is  the  cumulative  distribution  function  of  a  ^  random  variable 
with  n  degrees  of  freedom.  Since  u=  Prob(a<ao), 


G 


Xu 


I  -  u 


Equivalently, 


(3.9) 


where  x\  i-u  denotes  the  1-u  percentile  point  of  a  chi-squared  random  variable  with  n  degrees  of  freedom. 
Solving  equation  (3.9)  for  Oq  yields, 


1 


V  J 

Since  U  is  uniform  from  0  to  1  if,  and  only  if,  1-U  is  uniform  from  0  to  1,  we  have 


a  = 


n 

\ 

i“  1 

2 

^  2 

Z  n.U 

y 

(3.10) 


Equation  (3.10)  expresses  a,  treated  as  a  random  variable  due  to  uncertainty  regarding  its  true  value,  as  a 
function  of  the  uniform  random  variable  on  [0, 1]  and  the  test  data.  Note  a  only  depends  on  the  test  data  through 

n  2 

the  values  of  n  and  2  jf,  • 

From  equation  (3.7)  or  (3. 10),  it  is  clear  that  as  more  data  are  acquired  that  the  distribution  of  uncertainty 
for  the  standard  deviation  changes.  From  the  data  in  Table  1,  equation  (3. 10)  simplifies  to; 


<7  - 


0.5897  forn=2. 


and  15808  forn=19. 

CF  =  - 
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For  a  given  sample  size  n,  one  can  randomly  select  a  value  u  from  U  [0,  1],  determine  the  corresponding 
2  2  .  .  .  .  2 
percentile,  and  compute  a  random  value  for  a.  The  cumulative  distribution  ftinctions  that  relate  u  to 

2 

percentiles,  ^  for  n=2  and  n=19  are  shown  in  Figures  2  and  3,  respectively. 


Figure  2.  Chi-squared  Cumulative  Distribution  Function  for  Two  Degrees  of  Freedom. 


Figure  3.  Chi-squared  Cumulative  Distribution  Function  for  Nineteen  Degrees  of  Freedom. 

Figure  4  shows  the  probability  density  functions  when  two  and  19  data  points  are  used  to  update  the 
uncertainty  regarding  the  standard  deviation.  The  sample  standard  deviation  when  two  data  points  are  considered 
is  0,590.  From  Figure  4,  one  can  see  that  there  is  a  high  probability  that  the  standard  deviation  will  take  on  a 
value  near  0,590.  The  distribution  of  uncertainty  is  also  very  diffuse  and  takes  on  a  wide  variety  of  values  with 
significant  probability.  When  all  19  data  points  are  considered,  the  sample  standard  deviation  is  0.372.  Again,  the 
distribution  of  uncertainty  is  highly  weighted  in  that  area.  Now,  however,  the  distribution  is  across  a  much 
narrower  range  of  values  for  the  standard  deviation.  As  more  data  about  the  performance  parameter  are  collected, 
the  uncertainty  in  the  distribution  parameter  that  defines  the  performance  parameter  decreases.  The  methodology 
proposed  in  this  report  quantifies  the  distribution  of  imcertainty  and  allows  for  it  to  be  introduced  into  the  Pk 
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simulation. 


Figure  4.  Posterior  Distributions  of  Uncertainty  Regarding  Performance  Parameter  Standard  Deviation 
Given  Two  and  Nineteen  Data  Points. 

To  create  the  distribution  of  belief  regarding  Pk,  600  values  for  u  were  selected  randomly  from  U[0,1].  These 
values  for  u  were  then  used  to  determine  600  values  of  ct.  Methodology  development  is  ongoing  to  provide  a 
means  of  determining  the  required  number  of  random  number  draws.  One  method  would  be  to  compare  the  drawn 
distribution  of  uncertainty  with  the  analytical  expression  for  it.  When  the  difference  between  the  two  is  below 
some  acceptable  threshold,  one  would  no  longer  select  another  value.  The  cumulative  distribution  functions  for  the 
uncertainty  regarding  the  standard  deviation  using  two  and  19  data  points  are  shown  in  Figure  5.  The  agreement 
between  the  analytical  functions  and  those  generated  by  randomly  drawing  600  values  is  quite  good  (maximum 
difference  less  than  5  percent)  indicating  that  600  draws  is  sufficient  to  adequately  characterize  the  distribution  of 
uncertainty  for  this  example. 


Figure  5.  Comparison  of  the  Analytical  and  Simulated  Cumulative  Distribution  Functions  for  the 

Uncertainty  in  the  Performance  Parameter  Standard  Deviation  Using  Two  and  Nineteen  Data 
Points. 
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To  relate  the  distribution  of  uncertainty  in  the  performance  parameter  standard  deviation  to  Pk,  one  must 

exercise  the  simulation.  Figure  6  shows  the  sensitivity  of  P  k  to  variations  in  the  performance  parameter  standard 
deviation.  The  points  in  the  figure  are  the  estimates  of  Pk  generated  using  a  25  replication  Monte  Carlo  set  of  runs 
and  a  given  value  for  the  standard  deviation.  The  line  in  the  figure  is  a  third  order  polynomial  curve  fit  to  the  data. 

For  this  example,  the  P  k  values  corresponding  to  the  randomly  drawn  sample  of  600  g‘s  were  used  when  relating 
the  performance  parameter  standard  deviation  to  Pk.  The  methodology  can  be  exercised  two  ways.  One  way  is  to 
execute  the  simulation  for  each  value  of  the  standard  deviation  drawn  from  the  distribution  of  imcertainty.  The 
other  way  is  to  execute  the  simulation  using  a  sufficient  number  of  different  standard  deviations  to  construct  the 
relationship  between  the  performance  parameter  standard  deviation  and  Pk,  One  could  then  use  this  relationship  to 
compute  Pk  values  that  correspond  to  randomly  selected  standard  deviations  instead  of  executing  the  simulation. 
AMSAA  is  currently  investigating  the  efficiencies  of  each  approach  as  part  of  the  follow-on  effort  to  this  report. 


A. 

Figure  6.  Sensitivity  of  P  u  Estimates  Using  a  25  Replication  Monte  Carlo  Set  to  Variations  in  Performance 
Parameter  Standard  Deviation* 

Creating  an  estimated  distribution  of  belief  regarding  Pk  is  simply  a  matter  of  combining  the  probability  density 
for  the  standard  deviation  in  Figure  4  with  the  Pk  versus  standard  deviation  estimated  relationship  displayed  in 
Figure  6.  The  estimated  distributions  of  belief  regarding  Pk  when  two  and  19  data  points  are  used  to  update  the 
distribution  of  imcertainty  regarding  s  are  shown  in  Figure  7.  Note  the  diffuse  nature  of  the  distribution  of  belief 
when  two  data  points  are  used  relative  to  the  distribution  of  belief  when  19  data  points  are  used.  Because  the 
uncertainty  in  the  performance  parameter  standard  deviation  decreases  with  more  data,  the  resulting  distribution  of 
belief  becomes  more  focused  about  a  single  value. 

Recall  from  Figure  1  that  the  belief  probability  that  the  true,  but  unknown,  Pk  exceeds  some  goal  is  simply  the 
area  under  the  distribution  of  belief  curve  that  lies  to  the  right  of  that  goal.  Figure  8  shows  the  estimated  belief 
probability  that  the  true  Pk  exceeds  any  goal  Pk.  The  sample  standard  deviation  for  the  two  data  points  in  this 

example  is  0.590.  From  Figure  6,  the  P  k  associated  with  that  standard  deviation  is  0.84.  This  is  typically  the  only 
information  given  to  decision  makers.  Using  the  information  in  Figure  8,  one  can  also  give  the  decision  maker 
some  assurance  that  the  true  system  performance  exceeds  the  estimate  of  performance  by  associating  an  estimated 
belief  probability  of  0.72  with  the  point  estimate  of  0.84.  After  19  data  points  have  been  collected,  the  sample 

standard  deviation  decreases  to  0.372.  The  missile  P  k  associated  with  that  standard  deviation  is  0.88  (from  Figure 
6).  The  estimated  belief  probability  associated  with  that  missile  P  k  is  0.85.  For  any  goal  Pk  value,  the  additional 
data  points  gathered  resulted  in  an  increase  in  the  estimated  belief  probability  that  the  true  Pk  exceeds  that  goal. 
Additionally,  with  the  information  provided  by  the  19  data  points,  the  true  Pk  exceeds  0.84  (i.e.,  the  estimated 
belief  probability  is  1.0)  and  the  true  Pk  does  not  exceed  0.92  (i.e.,  the  estimated  belief  probability  is  0.0). 
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Figure  7.  Estimated  Posterior  Distributions  of  Belief  Regarding  Pu  Created  Using  Two  and  Nineteen  Data 
Points. 


Figure  8.  Estimated  Belief  Probability  that  the  True  Pk  Exceeds  any  Given  Goal  Using  Two  and  Nineteen 
Data  Points. 

To  illustrate  how  this  information  could  be  used  by  programmatic  decision  makers,  consider  that  the  ^stem  is 
required  to  achieve  a  Pk  of  0.8.  Early  in  the  life  cycle  of  the  program,  say  Milestone  I,  the  decision  maker  may  be 
willing  to  accept  a  moderate  amount  of  risk.  This  manifests  itself  in  allowing  the  program  to  proceed  even  if  there 
is  a  low  (say  0.7)  estimated  belief  probability  that  the  true  missile  Pk  exceeds  the  requirement.  By  using  the  test 
data  generated  during  the  program’s  life  cycle  to  update  the  uncertainly  about  the  distribution  parameters  of  key 
performance  parameters,  the  estimated  distribution  of  belief  regarding  Pk  will  change.  As  the  program  matures, 
one  would  expect  the  decision  maker  to  demand  less  risk,  so  the  program  must  demonstrate  a  higher  estimated 
belief  probability  that  the  true  Pk  exceeds  the  goal  before  it  would  be  allowed  to  proceed.  If  the  system  is  exceeding 
its  goal  Pk,  a  larger  portion  of  the  updated  estimated  distribution  of  belief  curve  regarding  Pk  will  lie  to  the  right  of 
the  goal  (0.8  in  this  discussion)  provided  our  simulation  estimates  of  Pk  are  sufficiently  accurate,  thus 
demonstrating  a  higher  estimated  belief  probability  that  the  true  system  performance  exceeds  the  goal. 

As  discussed  earlier,  this  methodology  can  be  used  to  determine  the  optimum  allocation  of  fixed  resources  for 
the  collection  of  data  to  minimize  risk  or  to  minimize  data  required  to  demonstrate  a  given  level  of  risk.  The 


65 


distribution  of  belief  regarding  Pk  is  related  to  the  uncertainty  in  the  distribution  parameters  which  is  reduced  by 
acquiring  test  data.  Therefore,  a  test  strategy  can  be  optimized  to  focus  on  reducing  the  uncertainty  in  those 
distribution  parameters  that  contribute  most  to  the  difihiseness  of  the  distribution  of  belief  regarding  Pk.  This 
methodology  depicts  the  cause  and  effect  relationship  between  distribution  parameter  uncertainty  and  distribution 
of  belief  regarding  Pk  thereby  providing  valuable  information  in  the  development  of  a  test  strategy. 

CONCLUSIONS 


For  some  time,  the  Army  conununity  has  been  trying  to  determine  a  way  of  providing  assurance  to  decision 
makers  that  true  overall  system  performance  meets  requirements  when  estimates  of  system  performance  come  from 
simulations.  A  method  of  relating  that  assurance  to  data  gathered  in  the  development  test  program  when  overall 
system  performance  is  not  being  measured  is  also  of  interest.  The  methodology  described  in  this  report  provides  a 
means  of  quantifying  assurance  in  terms  of  Bayesian  belief  probability  and  relates  that  belief  probability  to  test  data 
from  any  source.  By  capturing  the  uncertainty  in  the  distribution  parameters  that  comprise  key  performance 
parameters  when  executing  the  performance  simulation,  one  can  create  a  distribution  of  belief  regarding  the  system 
performance  parameter  based  on  the  body  of  knowledge  about  the  system  at  a  given  time.  Through  this 
distribution  of  belief,  one  can  quantify  the  belief  probability  that  the  true,  but  unknown,  system  performance 
parameter  exceeds  any  given  goal. 

The  example  provided  in  this  report  shows  how  the  methodology  is  executed.  It  is  important  to  note  that  the 
example  looked  only  at  a  subset  of  the  entire  problem.  When  conducting  a  more  comprehensive  analysis,  one 
could  expect  to  encounter  a  variety  of  distributions  and  uncertainty  regarding  any  number  of  parameters  that 
comprise  those  distributions.  There  is  still  much  work  to  be  completed  before  this  methodology  is  a  viable  tool  for 
use  in  Army  missile  system  development  programs.  The  attractive  feature  of  this  work  is  its  potential  for  broad 
application.  It  is  not  limited  to  assessing  belief  probability  in  Pk,  but  rather  is  applicable  to  any  simulation  which 
utilizes  stochastic  inputs  and  processes  to  develop  output. 

FUTURE  WORK 

The  focus  of  future  efforts  will  be  to  fully  develop  the  methodology  to  include  a  variety  of  distributions  (i.e., 
exponential,  log-normal,  etc.)  and  combinations  of  uncertainty  in  the  parameters  that  comprise  those  distributions. 
Although  execution  of  the  methodology  was  manageable  for  the  example  conducted  in  this  report,  efficient 
execution  will  be  of  paramount  importance  when  utilizing  the  methodology  to  support  an  Army  missile  system 
development  program.  Many  of  the  steps  of  the  methodology  for  generating  the  distributions  of  belief  in  this  report 
(construction  of  the  histograms,  generation  of  the  standard  deviation  distributions,  etc.)  were  conducted  manually. 
Future  efforts  will  automate  these  steps  with  the  intent  of  simplifying  the  process.  It  is  expected  that  the  biggest 
obstacle  to  implementing  the  methodology  is  the  number  of  times  the  system  Pk  simulation  must  be  executed.  One 
area  being  pursued  as  a  means  of  reducing  the  number  of  runs  required  is  the  use  of  surface  fits  to  relate  the 
distribution  parameter  values  to  Pk  instead  of  making  a  run  for  each  combination  of  the  distribution  parameter 
values.  By  implementing  these  measures  for  improving  efSciency  and  developing  robust  methodology  for  many 
distributions  and  combinations  of  uncertainty,  AMSAA  feels  this  tool  will  be  valuable  to  the  Army  in  assessing  the 
performance  achieved  by  a  developmental  weapon  tystem  when  the  primary  means  for  quantifying  that 
performance  comes  from  simulation. 

There  is  another  source  of  uncertainty  that  contributes  to  the  distribution  of  belief  regarding  Pk  that  was  not 
accounted  for  in  this  report.  As  discussed  earlier  in  this  report,  the  Pk  simulation  is  executed  in  Monte  Carlo 
fashion  a  fixed  number  of  times  (25  replications  in  this  report)  to  develop  an  estimate  of  Pk  for  a  given  set  of 
distribution  parameters.  Because  the  number  of  replications  is  finite,  there  is  always  uncertainty  as  to  what  the  true 
Pk  is  even  if  the  imcertainty  in  the  distribution  parameters  is  ignored.  Developing  methodology  to  incorporate  this 
element  of  uncertainty  is  another  focus  of  future  efforts. 
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Abstract 


Standard  control  charts  require  substantial  historical  data  to  estimate  the  parameters  of  the  u^er- 
lying  distribution.  While  that  historical  data  is  being  accumulated,  can  one  stiU  monitor  the  proce^ 
det^mine  if  it  is  in  control?  Can  we  use  subsequent  data  to  refine  our  initial  estimate?  question 
rurally  timely  if  one  wishes  to  apply  Statistical  Process  Control  (SPC)  methods  to  short-run  pro- 
ce2  In  this  paper,  we  present  a  predictive  control  scheme  for  normal  variates  b^  on  the  predictive 
distribution,  piy\x),  which  allows  continuously  improving  control  charting  from  the  s^nd  obser^  ion 
at  the  latest.  We  include  some  novel  graphics  for  SPC.  We  discuss  the  advantages  of  this  approach,  and 
give  an  example. 

KEYWORDS:  predictive  inference,  statistical  process  control,  short-run 


Introduction 

In  this  paper,  we  develop  methods  for  statistical  process  control  based  on  the  predictive 
normal  v^ates.  This  allows  control  methods  to  be  applied  almost  immediately,  instead  of  waiting  for  the 
or  25  rational  groups  recommended  in  the  literature  (Montgomery,  1985).  This  is  paiticul^ly  ^vantag^iK 
short  run  process  control,  where  there  may  never  be  extensive  histoncal  data.  It  is  also  advantageous  for  1  g 
tLn  control,  because  the  predictive  distribution  continues  to  be  refmed  as  additmnal  data  is  accumulated. 
Use  of  the  predictive  distribution  confers  other  advantages,  which  will  be  discussed. 

We  note  that  predictive  control  schemes  were  proposed  originally  for  inverse  ga^an  proc^  with  a 
non-informative  prior  distribution  by  (OlweU,  1996).  We  extend  the  idea  here  to  the  normal  distribution, 
and  include  informative  prior  distributions  and  some  additional  graphic  measures  for  the  user. 

•OlweU  is  an  assistant  profe^or  in  the  Department  of  Mathematical  Sci^^  Umted  Steto 
NY  10996-1786.  This  research  was  partially  supported  by  the  Army  Research  Laboratory  an  e  a 
of  Excellence,  USMA. 

t  Approved  for  public  release;  distribution  unlimited. 
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Persistent  versus  sharp  change 


The  methods  of  this  paper  focus  on  detecting  an  isolated  special  cause;  that  is,  a  one-time  sharp  departure 
from  the  model  for  the  process.  Control  charts  are  best  for  detecting  these  shifts. 

To  detect  small  persistent  changes  for  start-up  data,  we  recommend  self-starting  CUSUM  charts  (Hawkins, 
1987)  or  predictive  CUSUM  charts,  currently  trader  development. 

In  this  spirit,  we  do  not  develop  or  advocate  supplementary  rules  for  the  predictive  charts,  since  CUSUM 
charts  are  optimal  for  detecting  persistent  model  shifts  (Moustakides,  1986) 


Uncertainty  about  parameters 

Parameters  for  a  controlled  process  are  never  known.  At  best,  we  may  have  very  precise  estimates  for  them. 
In  this  work,  we  expUcitly  model  the  uncertainty  about  our  parameters,  and  reflect  the  improved  precision 
in  our  knowledge  of  the  parameters  that  comes  with  more  eictensive  data. 

Before  we  actually  b^  to  collect  data  about  a  process,  we  may  have  information  or  beliefs  about  how 
the  process  wiU  behave.  These  beliefe  can  be  based  on  similar  processes,  prototyping  and  engineering  studies, 
process  specification  limits,  or  general  beUef  about  how  the  process  “ought”  to  behave.  Very  rarely  m  an 
industrial  setting  wiU  we  have  no  idea  about  the  parameters  of  the  distribution  before  we  actually  coUect 

data. 

We  can  capture  these  beliefe  by  modeling  the  parameters  themselves  as  random  variables.  Using  a 
Bayesian  approach,  we  update  our  beliefs  about  the  parameters  as  we  observe  the  process. 

If  we  truly  have  no  information  about  the  parameters,  or  if  we  wish  to  be  conservative,  we  can  reflect 
that  lack  of  information  by  modeling  the  parameters  with  a  suitably  vague  prior  distribution. 

For  example,  imagine  a  production  process  that  fills  com  flakes  boxes.  Before  the  production  line  is  ever 
operational,  we  might  believe  that  the  tme  mean  weight  of  the  product  inserted  will  be  16.1  ounces,  give  or 
take  0.1  ounce.  We  also  might  believe  that  the  standard  deviation  of  the  process  might  be  0.5  ounces,  give  or 
take  0.25  ounces.  Using  these  opinions,  we  could  model  our  belief  about  the  imknown  mean  by  saying  that 
fjL  ~  i\r(16.1, 0.01).  We  could  represent  our  belief  about  the  unknown  variance  using  a  Gamma  distnbution. 

If  we  had  no  prior  information  about  the  behavior  of  the  weight  of  the  product,  we  could  use  a  very  flat 
prior  distribution,  letting  the  variance  of  our  estimates  for  the  mean  and  standard  deviation  grow  arbitrarily 

large. 

We  will  discuss  a  technique  for  eliciting  these  prior  beliefs. 

The  key  point  is  that  we  can  and  should  incorporate  these  prior  opinions  into  our  model  for  our  control 
scheme. 
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Predictive  distributions 

Predictive  distributions  are  based  on  a  Bayesian  approach.  In  this  discussion,  y  wiU  refer  to  the  unknown 
while  X  will  refer  to  the  ol«»vatio„(8)  alreariy  made.  «  will  be  the  ordmowr.  process 

parameter  (s). 

We  model  our  data  using  a  parametric  distribution,  /(xj^).  For  exainple,  we 
observed  data  follows  a  normal  distribution  with  unknown  parameters.  We  have  ^  opSL  can 

process  parameters,  which  are  not  known  exactly.  That  opimon  is  represented  by  p(0).  This  opimon  can 

very  strong,  or  it  can  be  vague. 

We  observe  the  process,  and  collect  data.  This  data  is  used  to  update  our  opinions  about  the  parameters, 
resulting  in  p(6>lx).  This  follows  the  standard  Bayesian  approach: 

/fli  ^  P(x|g)p(g) 

We  then  inflate  over  the  parameter  space  to  obtain  a  distribution 

/i(ylx)  =  /  /(xl0)p(0|x)d0 
Je 

where  j/  is  a  future  observation  of  the  process,  and  x  is  the  historical  data. 


(1) 


The  normal  distribution 

In  this  paper,  our  concern  is  with  processes  modeled  by  the  normal  distribution. 

For  ease  of  computation,  we  parameterize  the  normal  distribution  as  wh^e  /x  is  the 

r  =  l/o-^  is  the  precision.  This  is  a  standard  notation  for  the  normal  distnbution  when  applymg  Bayes 

methods. 

W.  will  use  conjugate  priom  here  to  ease  the  modeling  egort.  For  (p  r)  we 
Chi-Sauare  fNoCh)  joint  distribution,  which  has  four  hyper-parameters.  ~ 

that  fj.\T  ~  N(b,  cr)  and  rh  ~  0^.  Here  and  subsequently,  we  follow  the  notation  of  Aitchison  and  Dunsmore 
[1975].  Note  that  c,  g,  and  h  must  all  be  non-negative. 

The  roles  of  6  and  c  are  self-evident:  to  give  the  center  and  scale  (as  a  multiple  of  r)  of  the  distribution 
of  the  mean. 

The  roles  of  g  and  h  are  a  little  more  obscure.  The  Scaled  Chi-Square  distribution  is  eqmmlent  to  a 
Gamma  distribution  with  parameters  {g/2,  h/2).  It  is  helpful  to  remember  that,  under  this  pnor  distnbution, 

(2) 
(3) 


E 


5-2 


Var 


a) 

(?)  = 


2^2 


4)(5-2)2 
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This  allows  us  to  make  statements  about  the  expected  value  and  variance  of  and  then,  by  matching 
moments,  deduce  g  and  h. 

For  our  com  flake  example  earlier, F(<t)  =  0.5  and  Var{cr)  =  0.25^  =  1/16.  We  can  use  Equations  2  and 
3  to  obtain  g  =  12  and  h=  5.  From  this,  it  follows  that  E{t)  =  2.4  (and  Var{T)  =  0.96).  We  use  J5(r)  to 
estimate  c.  In  the  corn  flake  example,  we  had  our  uncertainty  about  fj,  estimated  with  a  standard  deviation 
of  0.1,  resulting  in  a  precision  of  100.  We  then  solve 

100  =  cE{r)  =  2.4c 

resulting  in  c  =  41.67.  We  round  down  to  c  =  40. 

Our  final  set  of  hyper-parameters  is 

(6,c,5,h)  =  (16.1,40, 12,5) 

For  those  rare  situations  where  we  truly  have  no  prior  opinion  about  the  parameters,  we  can  use  zero 
values  for  c,  g,  and  h  to  reflect  our  uncertainty. 

Given  these  priors,  we  still  need  the  posterior  distributions  for  the  parameters  and  the  predictive  dis¬ 
tribution  /i(j/|x).  We  will  use  sufficient  statistics  for  the  data.  For  our  historical  data  with  k  observations, 
we  represent  m  =  x,  and  v  =  (xi  —  x)^.  For  the  future  sample,  y,  of  size  E  we  have  the  sufficient 

statistics  M  and  V,  respectively.  Notice  we  use  lower  case  letters  for  our  observed  data,  and  capital  letters 
for  the  future  unobserved  data. 

The  calculations  are  extensive,  and  we  will  define  intermediate  terms  to  simplify  the  notation.  Aitchison 
and  Dunsmore  [1975]  provide  the  relevant  posterior  and  predictive  distributions  and  notation: 


p{tl,T\x) 

~  NoCh(B,C,G,H) 

(4) 

p(M|x) 

(5) 

p(Flx) 

~  Si{G,K-l,H) 

(6) 

C 

B 

=  c  +  k 
cb  +  km 

C 

A(c) 

_  r  0  (c  =  0) 

1  1  (OO) 

G 

=  y-Klir-H-A(c) 

H 

,  ck(m  —  b)^ 

-  c  +  k 

For  u  >  0,  Si(k,  g,h)]s  a  Siegel  distribution  with  density 

u(9/2)-i 

fiv;  k,g,  h)  =  ^(^/2,y/2)W/2(l  -|-t;/h)(*+fl)/2 

Stik,  6,  c)  is  a  location-scale  transformed  student  distribution,  with  k  degrees  of  freedom,  centered  at  6, 
and  scaled  by  the  factor  c. 
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We  note  that  the  marginal  distribution  for  /i|x  is  a  Student’s  distribution,  St{G,  B,  H/ (CG)). 

While  the  distributions  look  formidable,  the  calculations  are  easily  relegated  to  a  computer.  All  the  user 
will  see  in  our  implemented  scheme  are  4  charts.  Once  the  prior  is  established,  the  operator  will  only  input 
M,  V,  and  K  for  each  sample. 

Calculations  are  simplified  by  identities  allowing  probabilities  involving  the  posterior  distributions  to  be 
written  in  terms  of  incomplete  beta  functions,  as  noted  in  Aitchison  and  Dunsmore  (1975). 


The  scheme 


We  elicit  a  prior  distribution  for  This  requires  judgment  and  process  knowledge.  If  we  speafy  an 

unnecessarily  vague  prior  distribution,  we  will  be  relatively  slower  to  detect  out-of-control  states  until  we 
have  accumulated  relatively  more  data.  However,  if  we  specify  a  precise  but  mis-located  prior,  we  ^  si^al 
immediately.  The  advantage  to  using  informative  priors  is  quidier  sensitivity  to  either  mis-spedfied  pnors 
or  an  out-of-control  process. 

Once  the  prior  is  identified,  we  start  the  process.  For  the  first  sample  x,  we  obtain  the  posterior  distribu¬ 
tion  for  (m,  t)|x  and  a  predictive  distribution  for  y|x.  We  then  draw  the  second  sample  We  find  a  p-^lue 
for  the  second  sample,  using  the  predictive  distribution  based  on  the  earlier  observation  (s).  If  the  p-value  is 
too  extreme  compared  to  critical  p  values,  we  signal  an  out-of-control  situation.  Othendse,  we  incorporate 
the  second  sample  into  the  historical  data  set,  recalculate  our  historical  summary  statistics  m  and  v,  and 
construct  an  updated  posterior  distribution  and  predictive  distribution. 

We  obtain  our  critical  p  values  by  asking  the  decision  maker  to  specify  a  tolerable  average  run  length 
(ARL)  between  false  signals.  Then  in-control,  we  use  symmetric  probability  limits: 

P(  false  signal)  = 

and  j 

^ower  =  2ARL  ^  "  2ARL 

We  continue  sampling,  checking,  incorporating,  and  recalculating  until  the  process  signals.  Our  results 
axe  presented  to  the  dedaon-maker  using  charts. 


The  charts 


We  maintain  four  diarts.  We  maintain  charts  of  the  marginal  distributions  /i|x  and  t|x.  As  we  gather  more 
data,  these  should  each  approach  a  point  distribution.  These  allow  the  process  manager  to  see  how  much 
uncertainty  remains  at  any  point  about  the  parameters  of  the  process. 

The  third  diart  is  a  rescaled  plot  of  the  percentiles  of  M  values  against  the  rational  group  numbCT.  To 
help  distinguish  extreme  M-values,  we  use  the  inverse  normal  transformation  of  the  percentile  of  each  M 
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value,  based  on  the  predictive  distribution: 

In-control,  the  p-values  are  uniformly  distributed,  resulting  in  Z  ~  N{Q,  1). 

We  plot  these  rescaled  percentiles  to  obtain  constant  control  limits.  Without  a  transformation,  we  would 
have  to  recalculate  the  control  limits  for  each  observation,  which  is  visually  distracting  and  computation¬ 
ally  annoying.  Finding  variable  control  limits  using  the  predictive  distribution  is  more  computationally 
demanding  than  iBnding  a  percentile,  which  just  involves  a  simple  numerical  integration: 


P  = 


/M 

/i(M|x)dM 

'C?0 


(8) 


This  integral  can  be  expressed  as  an  incomplete  beta  integral,  allowing  the  use  of  fast,  accurate  existing 
algorithms.  This  follows  from  the  identity  given  in  Aitchison  and  Dunsmore, 

J  St{k,  b,c)  =  1/2) 

This  third  chart  has  nice  asymptotic  properties.  As  the  size  of  the  historical  record  increases,  M\x  —)• 
,T* /K),  where  /z*  and  r*  are  the  asymptotic  point  distributions  and  K  is  the  rational  group  size. 
$~^ii'(y|x)  asjmptotically  just  studentizes  the  observations. 

The  fourth  chart  is  a  plot  of  the  rescaled  percentiles  for  V  against  the  rational  group  number.  Asymi>- 
totically,  V\X  ~  Similar  to  the  third  chart,  we  plot 

W  =  F-^(p) 

where  F  is  the  CDF  for  the  Xk-i  distribution. 


Examples 


We  use  a  simulated  data  set  to  illustrate  the  methods  for  a  vague  and  informed  prior. 

We  then  use  a  data  set  from  Montgomeiy(1991)  to  illustrate  the  method  for  both  an  informed  and  vague 
prior  distribution.  The  data  consists  of  25  rational  subgroups  of  size  five,  measuring  the  inside  diameter  for 
automobile  piston  rings.  The  charts  behave  abnormally,  and  post-analysis  of  the  entire  data  set  indicates 
that  the  data  do  not  follow  the  assumed  distributions. 

We  have  implemented  the  calculations  on  QuattroPro  for  Windows,  a  commercially  available,  inexpensive 
spreadsheet.  All  of  the  graphics  are  imported  from  the  spreadsheet.  A  copy  of  the  file,  which  can  be  used 
for  any  data  set,  is  available  from  the  author. 

For  all  of  these  examples,  we  have  set  the  ARL  at  370,  resulting  in  performance  comparable  to  3  cr  control 
limits  for  normal  data. 
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Sample  mean 


xatioxxal  g^ottp 


Figure  1:  Plot  of  the  sample  average  for  Example  1. 


Example  1 

We  l«Ein  with  a  vague  prior:  o.s.h  =  0.  Under  control,  the  o^tio.^ 

We  draw  samples  of  siee  5.  We  change  the  distrihution  only  at  observation  22  to  W(2, 1).  The  change  is 
noted  immediately. 

Fieure  1  is  the  plot  of  sample  means.  Figure  2  is  the  plot  of  sample  V,  Figure  3  is  the  plot  of  the  M 
scores^gnaling  at  observation  22.  Figure  4  is  the  plot  ot  the  V  scores,  whfch  d^  rmt  s^.  f “ 
S  ptotTthe  Ltrihutlon  ot  plX  at  the  end  of  the  data  coUection.  Figure  6  rs  the  plot  ot  the  distribution 

of  t|X,  also  after  sample  25. 

Note  from  Figures  5  and  6  that  even  after  25  observations,  there  is  a  good  deal  of  uncertainty  about^e 
parameters  of  the  process,  and  the  most  likely  values  axe  not  the  (here  known)  true 

continuLg  to  Update  these  distributions  pa^  observation  25,  as  we  would  do  m  the  predictive 

scheme. 


Example  2 

For  this  example  we  maintain  the  same  model  as  in  Example  1.  We  change  the  timing  of  the  modd  departure 

to  o^  SSSS  in  the  process,  here  at  observation  6.  The  process  has  only  5  samples  upon  which  to  base 
its  predictive  distribution.  Again,  we  depart  to  a  N(2, 1)  distribution. 

The  departure  is  again  detected  immediately.  Figure  7  shows  the  plot  of  sample  averages,  Figure  8  the 
plot  of  sample  V,  Figure  9  the  plot  of  M  scores,  and  Figure  10  the  plot  of  V  scores. 


Figure  2:  Plot  of  the  sample  V  for  Example  1. 


IVt  Scores 


Figure  3;  Plot  of  the  M  scores  for  Example  1.  Note  the  signal  at  observation  22.  Also  note  that  there  is  no 
M  score  for  the  first  observation,  because  we  have  Tjsed  a  vague  prior. 
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"V  scores 


Figure  5: 
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Figure  4:  Plot  of  the  V  scores  for  Example  1. 


r>istrit»ntion  of  m.n  |  data 


Plot  of  the  posterior  distribution  for  ^i\X  for  Example  1,  after  all  25  observations. 
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Distribxxtion.  of  Tau.  |  X! 


Figure  6:  Plot  of  the  posterior  distribution  for  rjX  for  Example  1,  after  all  25  observations. 


Sample  mean 


Figure  7:  Plot  of  the  sample  averages  for  Ekample  2. 
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Sam-ple  S 


Figure  8:  Plot  of  the  sample  V  for  Example  2. 


Scores 


Figure  9:  Plot  of  the  M  scores  for  Example  2.  Note  the  signal  at  observation  6. 
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V  scores 


Figure  10:  Plot  of  the  V  scores  for  Example  2. 


Example  3 

We  turn  to  the  Montgomery  data  given  in  Table  6,3  (Montgomery,  1991),  again  with  a  vague  prior.  Figures 
11, 12, 13, and  14  contain  the  plots  of  the  sample  average,  V,  M  scores,  and  V  scores,  respectively. 

Note  the  aberrant  behavior  of  the  plot  of  V  scores  in  Figure  14.  This  plot  shows  values  apparently  much 
too  low.  A  Q'-  q  plot  of  the  sample  variances  (S^)  is  given  in  Figure  15,  and  indicates  that  the  sample 
variances  for  this  published  data  do  not  appear  to  be  proportional  a  xl  distribution.  There  are  fewer  than 
expected  large  values  of  S^.  Accordingly,  the  plots  of  the  V  score  do  not  behave  as  expected. 

While  we  do  not  advocate  for  the  use  of  these  charts  to  detect  such  model  departures,  we  would  not  have 
otherwise  been  prompted  to  check  the  q'—  q  plot  for  this  data. 

Again,  Figures  16  and  17  indicate  how  much  imcertainty  remains  about  the  process  mean  and  precision. 


Example  4 

This  last  example  shows  the  effects  of  strong  prior  distribution.  WE  revisit  Example  2.  We  assume  p  ^ 
JV(0,100r).  We  assume  that  E(1/t)  =  1  and  Var{l/r)  =  .01,  resulting  in  r  r(4.02/2, 2.02/2).  The 
spreadsheet  output  for  the  data  is  in  Figure  18.  Note  that  the  plot  signals  even  more  strongly  for  the  point 
which  doesn’t  follow  the  model.  The  M  scores  are  separately  plotted  at  Figure  19,  for  ease  of  comparison. 

Strong  prior  information  improves  the  sensitivity  of  the  scheme. 
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Sarrtple  mean 


Figure  11:  Plot  of  the  sample  average  for  Ebcample  3. 


WL  Scores 


Figure  12:  Plot  of  the  sample  V  for  Example  3. 
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Sample  S 


Figure  13:  Plot  of  the  M  scores  for  Example  3. 


V  scores 


Figure  14:  Plot  of  the  V  scores  for  Example  3. 
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Q-Q  plot 


Figure  15:  g  -  g  plot  of  the  sample  variances  against  a  for  Example  3.  Noti^  the  poor  fit.  The  B?  for 
thfisodated  regression  is  0.85,  which  is  highly  significant  agamst  the  Shapiro-Wilks  criteria. 


TDistribxition  of  m.TLi  |  data. 


xxLia. 


Figure  16;  Plot  of  the  posterior  distribution  for  ^|X  for  Example  3,  after  all  25  observations, 


Oistribxition  of  Tau.  |  >C 


tau. 


Figure  17:  Plot  of  the  posterior  distribution  for  r|X  for  Example  3,  after  all  25  observations. 


Figure  18;  Spreadsheet  view  of  the  data  from  Example  Four,  with  a  strong  prior.  Note  the  strong  signal  at 
observation  6  on  the  plot  of  M  scores. 


Scores 


gxo-CLp 


Figure  19:  M  scores  for  the  data  from  Example  Four,  with  a  strong  prior.  Note  the  strong  signal  at 
observation  6. 

Conclusion 


We  have  introduced  a  quality  control  scheme  to  detect  isolated  special  causes  based  on  the  joint  pr^^ive 
distribution  for  the  sample  sufficient  statistics.  This  scheme  allows  us  to  begin  vahd  SPC  immediately, 
without  waiting  to  accumulate  historical  data.  Additionally,  the  scheme  continues  to  refine  itself  as  more 
data  is  collected,  resulting  in  more  precise  estimates  for  the  process  parameters. 

The  process  is  easily  implemented  on  a  commercial  spreadsheet,  as  we  have  done  here,  and  coidd  be 
added  to  commercial  SPC  products  with  httle  labor.  Once  the  prior  distribution  has  b^n  estiinated,  the 
operator  needs  only  to  enter  the  sample  mean,  the  sample  standard  deviation  or  sample  F,  and  the  sample 

size. 

The  charts  implementing  the  scheme  provide  useful  information  about  the  process  behavior  and  the 
current  sample. 

While  we  have  not  illustrated  this,  the  charts  accept  variable  sample  sizes. 

These  tools  should  be  adopted  by  any  practitioner  confronted  with  short  runs,  or  a  need  for  continually 
improving  parameter  estunates  for  the  process. 
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Permiitation-based,  Extrapolated  Regression  Estimates 

(clinical  presentation) 


David  W.  Webb 

U.S.  Army  Research  Laboratory,  APG,  MD 


Explanation  of  the  Problem 

Early  in  1996,  an  electrical  engineer  in  my  branch  challenged  me  with  data  he  had  collected  from  a  firing 
test.  Twenty-two  rounds  were  fired  from  a  Cannon-Caliber  Electromagnetic  Gun  (CCEMG).  Among  the  variables 
that  he  measured  from  each  shot  were  impact  locations  (relative  to  the  aimpoint)  recorded  on  yaw  cards,  and  the 
launch  velocity.  Launch  velocities  ranged  from  826  m/s  to  1,785  m/s;  however,  when  the  CCEMG  is  fully 
operational  its  required  launch  velocity  will  be  1 850  m/s.  The  task  that  my  co-worker  needed  assistance  with  was 
predicting  the  dispersion  (i.e.,  the  standard  deviation  of  the  impacts)  of  the  CCEMG  at  the  full  design  velocity  of 
1 850  m/s,  hereafter  denoted  apov 

Several  challenges  confronted  this  effort.  First,  there  was  the  problem  of  deciding  how  to  compute 
dispersion  when  there  are  no  exact  repeat  observations  of  any  velocity.  Second,  a  procedure  was  needed  for 
extrapolating  beyond  the  observed  range  of  velocities  to  predict  the  dispersion  at  full  design  velocity. 

To  address  the  issue  of  calculating  dispersions  in  the  absence  of  repeat  observations,  shots  were  grouped 
according  to  a  near-neighbors  philosophy;  then  within  each  group  the  impact  dispersion  and  average  launch 
velocity  were  computed.  The  engineer  had  already  divided  the  rounds  into  four  groups.  He  had  designated  a  low- 
velocity  (800-850  m/s)  group  consisting  of  four  rounds;  two  mid-velocity  (1,000-1,200  m/s)  groups  of  five  and 
seven  rounds;  and  a  high-velocity  (1,250-1,800  m/s)  group  of  six  rounds.  The  mid-velocity  groups  were 
distinguished  by  whether  or  not  the  bore  of  the  cannon  was  honed  (cleaned)  before  each  firing.  Using  a  stem-and- 
leaf  plot.  Table  1  shows  the  distribution  of  the  22  launch  velocities  and  the  classification  of  the  rounds.  [Note; 
Placing  the  1282  m/s  round  in  one  of  the  mid- velocity  groups  would  have  made  it  closer  to  its  neighbors,  however  it 
remained  in  the  high-velocity  group  to  stick  with  the  engineer’s  convention.] 

Because  there  was  no  prior  assumption  as  to  the  true  physical  relationship  between  velocity  and  dispersion, 
a  simple  linear  regression  between  these  variables  was  used.  However,  the  prospect  of  using  a  linear  regression  of 
just  four  data  points  (one  per  velocity  group)  to  obtain  the  prediction  of  Cpov  seemed  quite  tenuous.  Therefore, 
rounds  within  a  group  were  partitioned  into  smaller  subgroups  consisting  of  two  rounds  each  (three  rounds  for  one 
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of  the  subgroups  if  the  group  size  was  odd),  thereby  "creating"  more  data  points  for  the  regression.  The  number  of 
possible  ways  to  partition  the  groups  into  subgroups  is  shown  in  Table  2. 


Low  Velocity 

800 

900 

26 

42 

48 

48 

Mid  Velocity 

1000 

15 

63 

75 

81 

87 

87 

1100 

39 

40 

76 

90 

90 

90 

1200 

82 

- 

1300 

High  Velocity  1400  49  92 


1500  15 

1600  39 

1700  85 

Table  1:  Stem-and-leaf  plot  of  the  22  launch  velocities  observed  in  the  test.  Italicized  figures  indicate  that  the  bore 
was  honed  prior  to  firing.  These  rounds  form  one  of  the  two  mid-velocity  groups. 

By  then  forming  all  possible  permutatons  of  the  subgroups  it  would  be  possible  to  obtain 
(3)(10)(105)(15)=47,250  unique  regressions  of  average  launch  velocity  versus  dispersion,  along  with  the  same 
number  of  predictions  of  apov  Upon  ordering  these  47,250  estimates,  one  could  then  obtain  a  90%  confidence 
interval  of  (Tfdv  (Due  to  the  high  degree  of  uncertainty  associated  with  extrapolated  estimates,  a  confidence  interval 
for  apDv  was  deemed  more  appropriate  than  a  point  estimate.) 

However,  at  this  early  stage  there  was  a  critical  flaw  in  the  analysis.  Note  in  Table  1  that  the  launch 
velocities  of  the  four  low- velocity  rounds  are  relatively  close  to  each  other.  Therefore,  the  use  of  an  average 
velocity  as  the  dependent  variable  in  a  regression,  although  technically  a  violation  of  the  usual  regression 
assumptions,  should  not  be  of  grave  concern.  On  the  other  hand,  the  launch  velocities  of  the  high-velocity  rounds 
are  quite  different,  ranging  from  a  low  of  1 282  m/s  to  a  high  of  1 785  m/s.  Is  it  reasonable  to  consider  rounds  with 
such  different  launch  velocities  as  near-  neighbors  and  allow  their  inclusion  in  the  same  partition?  Probably  not. 

To  address  this,  a  closeness  criteria  was  implemented  which  stated  that  rounds  from  within  the  same  group  could 
not  be  partitioned  into  the  same  subgroup  if  their  launch  velocities  differed  by  170  m/s  or  more.  While,  admittedly, 
this  value  of  170  m/s  value  may  still  seem  to  be  too  high,  it  was  the  smallest  difference  one  could  use  to  still  acquire 
three  partitions  of  two  rounds  each  from  the  high-velocity  group.  With  this  new  restriction  on  the  permutations,  the 
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number  of  possible  regressions  and  estimates  of  Opov  dropped  from  47,250  to  just  1,890  (see  Table  3). 


Group 

Group  Size 

Subgroup 

Sizes 

Number  of  Partitions  Possible 

1  -  Low  velocity 

4 

2,2 

/ 

\  / 
4 

2). 

2 

2'' 

2'  =3 

2A  -  Mid  velocity 

5 

2,3 

l‘)( 

3)  =10 

2B "  Mid  velocity 

7 

2,  2,3 

(a) 

(^] 

3)  .in. 

2  ! 

3  -  High  velocity 

6 

2,2,2 

3  I 

Table  2:  Summary  of  all  possible  partitions  of  the  four  groups  into  subgroups  of  size  two  (and  three  if  necessary). 


Group 

Group  Size 

Subgroup 

Sizes 

Number  of  Partitions 

Possible 

1  -  Low  velocity 

4 

2,2 

3 

2A  -  Mid  velocity 

5 

2,3 

6 

2B  -  Mid  velocity 

7 

2, 2,3 

105 

3  -  High  velocity 

6 

2, 2,2 

15 

Table  3:  Summary  of  all  possible  partitions  of  the  four  groups  into  subgroups  of  size  two  (and  three,  if  necessary), 
when  the  closeness  criteria  (no  shots  within  same  subgroup  having  launch  velocities  differing  by  170  m/s  or  more) 
is  invoked. 
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A  final  issue  to  address  was  how  to  use  all  of  the  yaw  card  data  in  the  computation  of  a  dispersion  estimate 
for  a  particular  subgroup.  I  did  not  want  to  discard  all  data  from  the  nearer  yaw  cards  and  use  only  the  most  distant 
yaw  cards  (where  flight  perturbations  have  damped  out  and  the  round  is  most  stable).  On  the  other  hand,  I  did  not 
think  it  wise  to  give  equal  weight  to  impact  data  from  the  nearest  and  the  farthest  yaw  cards,  since  at  close  range  the 
flight  is  not  stable  and  dispersion  measurements  tend  to  be  inflated. 


The  formula  decided  upon  as  an  estimator  for  the  dispersion  for  a  subgroup  involved  the  following:  for 
each  yaw  card  distance,  if  two  or  more  rounds  had  impact  data,  the  data  was  used  to  compute  a  dispersion  at  that 
particular  yaw  card  distance.  Denote  this  dispersion  by  s^,  where  i  indicates  yaw  card  number  and  i=l,2,,..,n. 
Furthermore  let  dj  be  the  distance  from  the  muzzle  of  the  CCEMG  to  the  yaw  card  station,  and  be  the  number  of 
rounds  within  the  subgroup  with  impact  data  at  yaw  card  station  i.  Then  the  weighted  estimate  of  dispersion  for  the 
subgroup  is  given  by  the  formula. 


= 


^l  EdM  -  1) 


This  "quasi-dispersion”  formula  is  similar  to  the  usual  pooling  equation  for  sample  standard  deviations  except  that  it 
includes  weighting  by  the  distance  to  each  yaw  card,  so  as  to  minimize  the  influence  of  impact  data  closer  to  the 
muzzle. 


Table  4  illustrates  the  use  of  this  formula  using  data  from  one  of  the  subgroups  of  Group  2A.  For  these 
rounds,  thirteen  yaw  cards  were  stationed  along  the  projectile's  path  to  record  impact  locations.  Notice  that  the  first 
four  yaw  cards  did  not  yield  any  impacts  (due  to  improper  positioning  of  the  cards)  and  thus  did  not  contribute  to 
the  calculation  of  s^. 

At  this  stage  of  the  analysis  one  proceeds  to  form  all  1,890  partitions  of  the  four  groups  of  data,  each  time 
applying  linear  regression  to  obtain  an  estimate  of  apov-  As  outlined  earlier,  after  ordering  all  1 ,890  estimates,  the 
outer  5%  quantiles  are  used  to  form  a  90%  confidence  interval  for  Gpov 
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Horizontal  Impact  Location 


i 

di 

■■ 

- - 

d,(ni-l) 

1 

5.0 

n/a 

n/a 

n/a 

0 

n/a 

n/a 

n/a 

2 

9.8 

n/a 

n/a 

n/a 

0 

n/a 

n/a 

n/a 

3 

15.0 

n/a 

n/a 

n/a 

0 

n/a 

n/a 

n/a 

4 

20.0 

n/a 

n/a 

n/a 

0 

n/a 

n/a 

n/a 

5 

25.0 

1.177 

2.370 

2.535 

3 

0.741 

50.0 

27.46 

6 

29.9 

0.968 

2.120 

2.857 

3 

0.952 

59.8 

54.20 

7 

170.1 

n/a 

n/a 

3.285 

1 

n/a 

n/a 

n/a 

8 

175.0 

n/a 

n/a 

3.388 

n 

n/a 

n/a 

n/a 

9 

180.0 

1.892 

2.766 

n/a 

2 

0.618 

180.0 

68.75 

10 

185.0 

1.800 

2.350 

3.314 

3 

0.766 

370.0 

217.31 

11 

190.3 

1.832 

2.427 

3.447 

3 

0.817 

380.6 

253.90 

12 

220.8 

n/a 

n/a 

3.374 

n 

n/a 

n/a 

n/a 

13 

222.0 

1.946 

2.631 

3.545 

3 

0.802 

444.0 

285.75 

X=  1484.4 

E  =  907.37 

s*  =  x/907.37  -f  /1 484.4  =  0.782 

Table  5:  Sample  calculation  of  "quasi-dispersion" 

Questions  for  the  panel: 

1 .  Is  the  interval  formed  truly  a  confidence  interval  for  the  dispersion  at  full  design  velocity,  or  is  it  more  akin  to  a 
prediction  interval  for  a  single  observation,  or  is  it  something  else? 

2.  The  decision  to  use  simple  linear  regression  was  made  to  keep  the  analysis  as  uncomplicated  as  possible,  given 
the  errors  in  the  dependent  variable.  Is  this  a  reasonable  choice,  despite  the  fact  that  physics  might  suggest  using 
either  transformed  variables  or  a  more  complex  regression  model? 

3.  Is  the  all-possible-permutations  approach  to  resampling  the  data  adequate,  or  should  a  bootstrap  method  have 
been  used  to  randomly  resample? 

4.  Is  the  use  of  distance  as  weighting  factor  in  my  "quasi  dispersion"  ill-advised? 


5.  Are  there  other  strategies  for  estimating  to  recommend? 
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Intentionally  left  blank. 
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410-278-7543 
DSN  298-7543 

ABSTRACT 

The  Wilcoxon  Signed-Rank  Test  is  a  nonparametric  test  for  the 
equivalence  of  population  medians  whose  statistic  is  based  on  the  differences 
between  observations  in  an  ordered  pair.  This  test  could  be  used  to  determine 
if  the  skill  level  of  a  soldier  before  and  after  training  is  significantly 
different.  The  Wilcoxon  Signed-Rank  Test  is  well  documented  in  numerous 
statistics  books  such  as  Conover's  Px^Q.ct>d.CQ.l  Nonps.jTBzne'tjrxc  S'ts.'tjLs'tJ.cs  and 
others . 


The  author  has  performed  a  limited  simulation  study  and  seeks  panel 
comment . 


OBJECTIVE 

The  objective  of  this  work  is  to  determine  what  is  "lost"  (less 
powerful)  statistically  as  the  sample  size  decreases  when  using  the  Wilcoxon 
Signed-Rank  Test  (i.e.  How  much  do  you  "lose"  statistically  if  the  number  of 
soldiers  (comparisons)  available  for  a  given  test  decreases  20  to  12,  20  to 
18,  30  to  10,  and  so  on?) 


BACKGROUND  INFORMATION 

A  simulation  was  done  to  compare  the  results  of  the  Wilcoxon  Signed-Rank 
Test  when  the  sample  sizes  (number  of  soldiers)  changed.  The  sample  sizes 
used  in  this  simulation  were  10  to  96  in  increments  of  2.  The  probability  of 
detecting  a  difference  was  calculated  for  the  sample  sizes  over  various  given 
probabilities  (.5  to  .9).  (For  example,  a  given  probability  of  .700  implies 
for  sample  sizes  of  20,  18,  and  12  that  the  average  number  of  positive 
differences  is  approximately  14  (.7  *  20),  12.6  (.7  *  18),  and  8.4  (.7  *  12), 
respectively.  In  other  words,  approximately  14  of  the  20  measurements  in  the 
first  group  are  greater  than  the  measurements  in  the  second  group.  Also,  the 
number  of  differences  would  have  to  be  integers,  but  for  comparisons  of 
different  sample  sizes,  the  percent  of  the  various  sample  sizes  was  used. ) 

The  simulation  was  done  by  the  following  method: 
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(1)  N  uniform  random  numbers  (0  to  1)  were  generated  (N  10  to  96  rn 
increments  of  2 ) . 

(2)  The  integers  from  1  to  N  were  put  in  a  column  next  to  the  random  numbers. 

(3)  The  various  "given  probabilities"  (.5,  .6,  .7,  .8,  and  .9)  were  compared 
ilL  the  random  nu^ers.  If  the  random  number  is  less  than  the  "grven 
probability",  the  comparison  is  different,  otherwise  the  comparrson  rs  the 

same* 

(4)  If  the  comparisons  were  different,  the  integers  (from  1  to  N)  were 
considered  negative,  otherwise  positive. 

(5)  The  quotient  of  the  -sum  of  the  integers-  and  the  -square  root  of  the  sum 
Of  the  squares  of  the  integers"  was  determined. 

(6)  This  procedure  was  done  1000  times. 

(7)  A  count  was  done  to  determine  the  "number  of  the  quotients" 

LL  (z  value  of  upper  0.95  level)  or  less  than  -1.645  (z  value  of  lower  ^5 
level).  This  count  was  divided  by  the  number  of  simulations  (1000)- 
quotient  is  the  probability  that  the  two  samples  are  different  at  the  .1 
significance  level  (alpha  =  .10)  for  a  given  probability. 

(8)  This  whole  procedure  was  repeated  50  times. 

The  averages  of  the  50  quotients  for  the  various  "given  probabilities" 

(  5  to  .9)  are  shown  in  Table  1  and  Figure  1.  The  minimum,  maximum,  and 
average  of  these  quotients  are  shown  in  Figures  2  through  6  and  Appendix  A. 
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TABLE  1.  PROBABILITY  OF  DETECTING  THAT  ITEMS  ARE  DIFFERENT  (APLPH  =  .10)  FOR 
SAMPLE  SIZES  OF  N  FOR  GIVEN  PROBABILITIES  OF  .5,  .6,  .7,  .8  AND  .9 


■Given  Probability 


N 

0*5 

0.6 

0.7 

0.8 

0.9 

10 

0*106 

0.161 

0.320 

0.573 

0.851 

12 

0.114 

0.175 

0.363 

0.636 

0.897 

14 

0.105 

0.178 

0.390 

0.687 

0.933 

16 

0.102 

0.192 

0.429 

0.736 

0.958 

18 

0.099 

0.189 

0.445 

0.770 

0.972 

20 

0.096 

0.196 

0.477 

0.807 

0.984 

22 

0.098 

0.208 

0.508 

0.841 

0.990 

24 

0.100 

0.221 

0.544 

0.873 

0.994 

26 

0.100 

0.228 

0.570 

0.894 

0.996 

28 

0.098 

0.236 

0.597 

0.912 

0.998 

30 

0.101 

0.249 

0.625 

0.928 

0.999 

32 

0.103 

0.257 

0.648 

0.940 

1.000 

34 

0.103 

0.270 

0.673 

0.950 

0.999 

36 

0.101 

0.277 

0.696 

0.960 

1.000 

38 

0.103 

0.289 

0.717 

0.968 

1.000 

40 

0.102 

0.297 

0.736 

0.974 

1.000 

42 

0.099 

0.303 

0.751 

0.978 

1.000 

44 

0.101 

0.315 

0.775 

0.983 

1.000 

46 

0.099 

0.319 

0.785 

0.986 

1.000 

48 

0.096 

0.336 

0.805 

0.989 

1.000 

50 

0.101 

0.341 

0.817 

0.991 

1.000 

52 

0.101 

0.349 

0.829 

0.993 

1.000 

54 

0.101 

0.358 

0.841 

0.994 

1.000 

56 

0.099 

0.368 

0.854 

0.996 

1.000 

58 

0.102 

0.378 

0.864 

0.996 

1.000 

60 

0.100 

0.384 

0.873 

0.997 

1.000 

62 

0.101 

0.392 

0.881 

0.997 

1.000 

64 

0.094 

0.409 

0.900 

0.997 

1.000 

66 

0.099 

0.408 

0.898 

0.998 

1.000 

68 

0.099 

0.418 

0.907 

0.998 

1.000 

70 

0.100 

0.430 

0.915 

0.999 

1.000 

72 

0.100 

0.434 

0.921 

0.999 

1.000 

74 

0.101 

0.445 

0.928 

0.999 

1.000 

76 

0.101 

0.453 

0.933 

0.999 

1.000 

78 

0.102 

0.462 

0.939 

0.999 

1.000 

80 

0.096 

0.475 

0.943 

0.999+ 

1.000 

82 

0.101 

0.478 

0.947 

0.999+ 

1.000 

84 

0.104 

0.486 

0.951 

0.999+ 

1.000 

86 

0.102 

0.493 

0.955 

0.999+ 

1.000 

88 

0.101 

0.503 

0.959 

0.999+ 

1.000 

90 

0.102 

0.508 

0.962 

0.999+ 

1.000 

92 

0.103 

0.516 

0.966 

0.999+ 

1.000 

94 

0.103 

0.523 

0.969 

0.999+ 

1.000 

96 

0.094 

0.530 

0.972 

0.999+ 

1.000 

Notes:  For  a  sample  size  of  20  with  a  given  probability  of  .7,  the 

probability  of  detecting  that  the  items  are  different  is  .477  at  the  .10 
significance  level. 
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Figure  1.  Probability  of  detecting  item  Figure  2.  Probability  of  detecting  that  the 

differences  for  various  "given  probabilities"  items  are  different  when  the  given  probability 

and  sample  sizes  is  0.50  for  sample  sizes  between  10  and  96 
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RESULTS /CONCLUSIONS  OF  SIMULATIONS 


(1)  For  "given  probabilities”  of  x  and  1  x,  the  probability  of  finding  that 
the  items  are  different  have  similar  results  due  to  symmetry  (i.e.  If  the 
given  probability  is  .70,  the  probability  that  the  items  are  different  would 
give  the  same  results  as  when  the  given  probability  is  .30  (1  -  .70)  with  a 
large  enough  sample  size. )  Therefore,  all  simulations  were  done  with  "given 
probabilities"  greater  than  or  equal  to  .5. 

(2)  For  all  "given  probabilities"  greater  than  .5,  the  probability  of 
detecting  that  the  items  are  different  will  approach  1  as  the  sample  size 
increases.  However,  for  a  "given  probability"  of  exactly  .5,  the  probability 
of  detecting  that  the  items  are  different  approaches  alpha  of  .10. 

(3)  As  the  "given  probability"  increases  from  .5  the  probability  that  the 
items  are  different  increases  and  the  sample  size  required  to  show  a 
difference  decreases  (i.e.  For  a  "given  probability"  =  .7,  N  =  30,  the 
probability  that  the  items  are  different  is  .625.  While  for  a  "given 
probability"  =  .8,  N  =  12,  the  probability  that  the  items  are  different 

is  . 636 . ) 

(4)  For  a  "given  probability"  of  .9  and  a  small  sample  size  of  10,  the 
probability  of  detecting  that  the  items  are  different  is  at  least  .85.  With 
sample  sizes  greater  than  30,  the  probability  of  detecting  that  the  items  are 
different  is  greater  than  .999. 

(5)  For  a  "given  probability"  of  .6  and  a  small  sample  size  of  10,  the 
probability  of  detecting  that  the  items  are  different  is  less  than  .2.  In 
order  to  detect  that  the  items  are  different  with  a  probability  of  at 
least  .50,  the  sample  size  would  have  to  be  approximately  88. 

(6)  When  the  sample  size  (number  of  soldiers,  etc)  decreases  from  20  to  12, 

20  to  18,  30  to  10  for  a  "given  probability"  of  .7,  the  probabilities  of 
detecting  that  the  items  are  different  decreases  24%  (.477  to  .363),  7%  (.477 
to  .445),  and  49%  (.625  to  .320),  respectively. 


In  short,  very  little  is  " lost"  statistically  when  the  sample  size  decreases 
from  20  to  18  but  not  so  for  20  to  12  and  30  to  10. 
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Appendix  A 


Given  Probability  =  .5 


N  Aya  Std  Min  Max 


10 

0.106 

0.0122  0.082  0.134 

12 

0.114 

0.0127  0.088  0.137 

14 

0.105 

0.0128  0.082  0.128 

16 

0.102 

0.0148  0.073  0.130 

18 

0.099 

0.0094  0.077  0.121 

20 

0.096 

0.0088  0.078  0.117 

22 

0.098 

0.0094  0.080  0.118 

24 

0.100 

0.0109  0.078  0.124 

26 

0.100 

0.0092  0.078  0.120 

28 

0.098 

0.0086  0.080  0.113 

30 

0.101 

0.0068  0.078  0.119 

32 

0.103 

0.0015  0.100  0.106 

34 

0.103 

0.0067  0.087  0.116 

36 

0.101 

0.0112  0.076  0.124 

38 

0.103 

0.0088  0.085  0.120 

40 

0.102 

0.0120  0.080  0.124 

42 

0.099 

0.0100  0.082  0.121 

44 

0.101 

0.0066  0.088  0.115 

46 

0.099 

0.0086  0.080  0.115 

48 

0.096 

0.0067  0.081  0.110 

50 

0.101 

0.0086  0.085  0.122 

52 

0.101 

0.0079  0.085  0.113 

54 

0.101 

0.0095  0.085  0.120 

56 

0.099 

0.0096  0.080  0.118 

58 

0.102 

0.0102  0.088  0.128 

60 

0.100 

0.0104  0.082  0.123 

62 

0.101 

0.0053  0.091  0.114 

64 

0.094 

0.0013  0.091  0.096 

66 

0.099 

0.0072  0.085  0.116 

68 

0.099 

0.0118  0.078  0.124 

70 

0.100 

0.0090  0.079  0.124 

72 

0.100 

0.0145  0.081  0.126 

74 

0.101 

0.0071  0.081  0.113 

76 

0.101 

0.0100  0.078  0.120 

78 

0.102 

0.0146  0.073  0.130 

80 

0.096 

0.0060  0.082  0.108 

82 

0.101 

0.0080  0.085  0.116 

84 

0.104 

0.0180  0.067  0.132 

86 

0.102 

0.0145  0.077  0.131 

88 

0.101 

0.0072  0.088  0.121 

90 

0.102  0.0085  0.080  0.121 

92 

0.103 

0.0084  0.090  0.120 

94 

0.103 

0.0063  0.087  0.115 

96 

0.094 

0.0016  0.088  0.096 

Given  Probability  =  .6 

Ayg  Std  Hin  Max 

0.161  0.0123  0.129  0.198 
0.175  0.0135  0.147  0.206 
0.178  0.0146  0.145  0.211 
0.192  0.0147  0.166  0.220 
0.189  0.0140  0.162  0.218 
0.196  0.0118  0.166  0.218 
0.208  0.0109  0.188  0.233 
0.221  0.0134  0.189  0.248 
0.228  0.0109  0.206  0.248 
0.236  0.0095  0.218  0.257 
0.249  0.0114  0.223  0.278 
0.257  0.0116  0.233  0.281 
0.270  0.0116  0.243  0.291 
0.277  0.0132  0.254  0.312 
0.289  0.0120  0.268  0.313 
0.297  0.0094  0.283  0.317 
0.303  0.0109  0.276  0.332 
0.315  0.0099  0.293  0.338 
0.319  0.0101  0.294  0.344 
0.336  0.0089  0.314  0.354 
0.341  0.0093  0.319  0.361 
0.349  0.0110  0.329  0.375 
0.358  0.0127  0.325  0.380 
0.368  0.0114  0.349  0.389 
0.378  0.0138  0.342  0.399 
0.384  0.0101  0.365  0.401 
0.392  0.0168  0.355  0.430 
0.409  0.0022  0.405  0.414 
0.408  0.0104  0.390  0.435 
0.418  0.0099  0.397  0.439 
0.430  0.0097  0.401  0.453 
0.434  0.0096  0.420  0.460 
0.445  0.0132  0.424  0.482 
0.453  0.0095  0.435  0.473 
0.462  0.0116  0.434  0.491 
0.475  0.0126  0.455  0.501 
0.478  0.0144  0.452  0.507 
0.486  0.0108  0.463  0.510 
0.493  0.0126  0.469  0.528 
0.503  0.0088  0.485  0.520 
0.508  0.0085  0.484  0.525 
0.516  0.0096  0.496  0.531 
0.523  0.0107  0.501  0.548 
0.530  0.0108  0.513  0.548 


Given  Probability  =  .7 

Avg  Std  Min  Max 

0.320  0.0146  0.294  0.365 
0.363  0.0148  0.338  0.395 
0.390  0.0166  0.357  0.425 
0.429  0.0130  0.401  0.452 
0.445  0.0133  0.420  0.475 
0.477  0.0158  0.445  0.510 
0.508  0.0129  0.481  0.529 
0.544  0.0125  0.523  0.565 
0.570  0.0129  0.543  0.597 
0.597  0.0144  0.572  0.638 
0.625  0.0143  0.598  0.664 
0.648  0.0138  0.623  0.673 
0.673  0.0122  0.648  0.699 
0.696  0.0121  0.675  0.732 
0.717  0.0130  0.696  0.740 
0.736  0.0086  0.717  0.755 
0.751  0.0144  0.721  0.787 
0.775  0.0152  0.746  0.803 
0.785  0.0103  0.763  0.806 
0.805  0.0065  0.793  0.822 
0.817  0.0089  0.796  0.834 
0.829  0.0081  0.811  0.849 
0.841  0.0079  0.824  0.858 
0.854  0.0120  0.825  0.874 
0.864  0.0087  0.846  0.882 
0.873  0.0087  0.859  0.893 
0.881  0.0064  0.867  0.894 
0.900  0.0016  0.898  0.905 
0.898  0.0082  0.878  0.913 
0.907  0.0070  0.887  0.920 
0.915  0.0080  0.892  0.932 
0.921  0.0062  0.906  0.935 
0.928  0.0073  0.913  0.943 
0.933  0.0067  0.921  0.950 
0.939  0.0051  0.928  0.948 
0.943  0.0077  0.928  0.955 
0.947  0.0054  0.936  0.959 
0.951  0.0061  0.935  0.963 
0.955  0.0063  0.937  0.968 
0.959  0.0036  0.948  0.965 
0.962  0.0046  0.952  0.972 
0.966  0.0032  0.959  0.972 
0.969  0.0058  0.958  0.982 
0.972  0.0024  0.967  0.977 
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Appendix  A  (Cont'd) 


Given  Probability  =  .8  Given  Probability  =  ,9 


N 

Avq 

Std 

Min 

Max 

Avq 

std 

Min 

Max 

10 

0.573 

0.0149 

0.544 

0.612 

0.851 

0.0096 

0.832 

0.877 

12 

0.636 

0.0108 

0.608 

0.661 

0.897 

0.0095 

0.871 

0.916 

14 

0.687 

0.0121 

0.660 

0.706 

0.933 

0.0062 

0.919 

0.945 

16 

0.736 

0.0121 

0.714 

0.763 

0.958 

0.0025 

0.952 

0.963 

18 

0.770 

0.0126 

0.744 

0.799 

0.972 

0.0047 

0.960 

0.982 

20 

0.807 

0.0088 

0.781 

0.825 

0.984 

0.0028 

0.977 

0.991 

22 

0.841 

0.0107 

0.820 

0.866 

0.990 

0.0027 

0.985 

0.996 

24 

0.873 

0.0093 

0.849 

0.891 

0.994 

0.0020 

0.989 

0.997 

26 

0.894 

0.0119 

0.870 

0.923 

0.996 

0.0020 

0.992 

1.000 

28 

0.912 

0.0089 

0.896 

0.930 

0.998 

0.0010 

0.996 

1.000 

30 

0.928 

0.0080 

0.911 

0.946 

0.999 

0.0010 

0.997 

1.000 

32 

0.940 

0.0058 

0.931 

0.948 

1.000 

0.0005 

0.999 

1.000 

34 

0.950 

0.0074 

0.935 

0.966 

0.999 

0.0009 

0.997 

1.000 

36 

0.960 

0.0058 

0.948 

0.971 

1.000 

0.0006 

0.998 

1.000 

38 

0.968 

0.0055 

0.952 

0.978 

1.000 

0.0004 

0.999 

1.000 

40 

0.974 

0.0037 

0.967 

0.981 

1.000 

0.0000 

1.000 

1.000 

42 

0.978 

0.0047 

0.966 

0.992 

1.000 

0.0003 

0.999 

1.000 

44 

0.983 

0.0036 

0.975 

0.990 

1.000 

0.0002 

0.999 

1.000 

46 

0.986 

0.0031 

0.979 

0.994 

1.000 

0.0000 

1.000 

1-000 

48 

0.989 

0.0037 

0.981 

0.994 

1.000 

0.0000 

1.000 

1.000 

50 

0.991 

0.0025 

0.984 

0.996 

1.000 

0.0000 

1.000 

1.000 

52 

0.993 

0.0022 

0.988 

0.996 

1.000 

0.0000 

1.000 

1.000 

54 

0.994 

0.0025 

0.989 

0.999 

1-000 

0.0000 

1-000 

1.000 

56 

0.996 

0.0015 

0.993 

0.999 

1.000 

0.0000 

1.000 

1.000 

58 

0.996 

0.0015 

0.992 

0.999 

1.000 

0.0000 

1.000 

1-000 

60 

0.997 

0.0017 

0.993 

1.000 

1.000 

0.0000 

1.000 

1.000 

62 

0.997 

0.0019 

0.993 

1.000 

1.000 

0.0000 

1.000 

1-000 

64 

0.997 

0.0002 

0.997 

0.998 

1.000 

0.0000 

1.000 

1.000 

66 

0.998 

0.0013 

0.995 

1-000 

1.000 

0.0000 

1.000 

1.000 

68 

0.998 

0.0011 

0.996 

1-000 

1.000 

0.0000 

1.000 

1.000 

70 

0.999 

0.0011 

0.996 

1.000 

1.000 

0.0000 

1.000 

1.000 

72 

0.999 

0.0008 

0.997 

1.000 

1.000 

0.0000 

1-000 

1.000 

74 

0.999 

0.0006 

0.998 

1.000 

1.000 

0.0000 

1.000 

1.000 

76 

0.99896 

0,0010 

0.997 

1.000 

1.000 

0.0000 

1.000 

1.000 

78 

0.99930 

0.0008 

0.998 

1.000 

1.000 

0.0000 

1.000 

1.000 

80 

0.99952 

0.0006 

0.998 

1.000 

1.000 

0.0000 

1.000 

1.000 

82 

0.99954 

0-0005 

0.999 

1.000 

1.000 

0.0000 

1.000 

1.000 

84 

0.99962 

0.0005 

0.998 

1.000 

1.000 

0.0000 

1.000 

1.000 

86 

0.99966 

0.0006 

0.998 

1.000 

1.000 

0.0000 

1.000 

1.000 

88 

0.99976 

0.0004 

0.999 

1.000 

1.000 

0.0000 

1.000 

1.000 

90 

0.99970 

0.0005 

0.999 

1.000 

1.000 

0.0000 

1.000 

1.000 

92 

0.99976 

0.0004 

0.999 

1.000 

1.000 

0.0000 

1.000 

1-000 

94 

0.99976 

0.0004 

0.999 

1.000 

1.000 

0.0000 

1.000 

1-000 

96 

0.99952 

0.0005 

0.999 

1.000 

1.000 

0.0000 

1.000 

1.000 
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RESULTS/CONCLUSIONS  OF  SIMULATIONS 

(1)  For  "given  probabilities"  of  x  and  1  -  x,  the  probability  of  finding  that 
the  items  are  different  have  similar  results  due  to  symmetry  (i.e.  If  the 
given  probability  is  .70,  the  probability  that  the  items  are  different  would 
give  the  same  results  as  when  the  given  probability  is  .30  (1  -  .70)  with  a 
large  enough  sample  size.)  Therefore,  all  simulations  were  done  with  "given 
probabilities"  greater  than  or  equal  to  .5. 

(2)  For  all  "given  probabilities"  greater  than  .5,  the  probability  of^ 
detecting  that  the  items  are  different  will  approach  1  as  the  sample  size 
increases.  However,  for  a  "given  probability"  of  exactly  .5,  the  probability 
of  detecting  that  the  items  are  different  approaches  alpha  of  .10. 

(3)  As  the  "given  probability"  increases  from  .5  the  probability  that  the 
items  are  different  increases  and  the  sample  size  required  to  show  a 
difference  decreases  (i.e.  For  a  "given  probability"  =  .7,  N  =  30,  the 
probability  that  the  items  are  different  is  .625.  While  for  a  "given 
probability"  =  .8,  N  =  12,  the  probability  that  the  items  are  different 

is  • 636. ) 

(4)  For  a  "given  probability"  of  .9  and  a  small  sample  size  of  10,  the 
probability  of  detecting  that  the  items  are  different  is  at  least  .85.  With 
sample  sizes  greater  than  30,  the  probability  of  detecting  that  the  items  are 
different  is  greater  than  .999. 

(5)  For  a  "given  probability"  of  .6  and  a  small  sample  size  of  10,  the 
probability  of  detecting  that  the  items  are  different  is  less  than  .2.  In 
order  to  detect  that  the  items  are  different  with  a  probability  of  at 
least  .50,  the  sample  size  would  have  to  be  approximately  88. 

(6)  When  the  sample  size  (number  of  soldiers,  etc)  decreases  from  20  to  12, 
20  to  18,  30  to  10  for  a  "given  probability"  of  .7,  the  probabilities  of 
detecting  that  the  items  are  different  decreases  24%  (.477  to  .363),  7%  (.477 
to  .445),  and  49%  (.625  to  .320),  respectively. 


In  short,  very  little  is  "lost"  statistically  when  the  sample  size  decreases 
from  20  to  18  but  not  so  for  20  to  12  and  30  to  10^ 
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IMPROVING  USE  OF  STATISTICS  IN  ARMY  TEST  AND  EVALUATION 
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ABSTRACT 

Three  topics  are  discussed.  First,  there  is  a  need  for  more  well  trained  statisticians  at  the 
Department  of  Defense.  Second  is  an  outline  of  an  approach  to  deal  with  the  problem  of  oper¬ 
ational  testing  of  a  system  for  potential  use  in  many  environments,  when  tests  can  be  carried 
out  in  very  few,  one,  two  or  three  environments.  This  approach  suggests  the  use  of  multidi¬ 
mensional  scale  analysis  as  one  of  the  tools  with  which  to  reduce  the  scope  of  the  problem. 
Finally  the  third  topic  is  “How  large  should  the  sample  size  be?”  It  is  shown  how  small  sample 
sizes  make  it  difficult  to  demonstrate  reliability  with  confidence.  For  testing  hypotheses  where 
sample  sizes  have  to  be  decided  nonsequentially,  in  advance  of  experimentation,  a  Bayesian  ap¬ 
proach  is  helpful,  and  the  use  of  normal  theory  approximations  allow  one  to  use  some  insightful 
graphs  for  approximate  solutions. 


INTRODUCTION 

I  was  somewhat  surprised  to  discover  the  title  of  my  lecture,  since  it  suggests  a  more  global 
view  than  I  am  accustomed  to  taking.  My  preference  is  to  concentrate  on  a  rather  narrow  topic 
and  hope  that  the  discussion  of  that  will  suggest  wider  applications.  In  view  of  the  title,  let  me 
address  three  rather  separate  subjects.  These  are  the  role  of  statisticians  in  defense,  a  special 
problem  in  operational  testing,  and  a  problem  of  hypothesis  testing. 

Since  I  will  be  preaching  to  the  converted  on  the  first  topic,  I  will  keep  that  brief.  The 
operational  testing  problem,  sometimes  called  “Dubin’s  challenge”  is  that  of  selecting  a  few 
(two  or  three)  testing  environments  in  which  to  test  a  system  which  is  potentially  required  to 
function  in  many  environments.  The  third  topic  involves  a  couple  of  examples  which  address 
the  question,  “How  large  should  the  sample  size  be?”,  a  question  which  comes  up  frequently  in 
testing. 

The  last  two  topics  might  be  more  properly  entitled  “Is  there  a  free  lunch?”  I  suspect 
that  some  administrators  responsible  for  allocating  funds  for  testing  may  find  some  of  the 
conclusions  disturbing. 

Lest  I  be  mistaken  for  more  of  an  expert  than  I  am  on  this  topic,  let  me  describe  my 
background.  I  am  a  Professor  of  Statistics  who  worked  on  an  applied  ONR  contract  for  many 
years  during  which  I  had  contact  with  a  variety  of  applications  of  statistics  in  defense  work.  I 
now  serve  as  a  member  of  an  NRC  (National  Research  Council)  panel  on  Operational  Testing 
which  has  been  studying  and  hopes  soon  to  report  on  several  issues  in  Operational  Testing. 
From  these  experiences  I  have  some  exposure  to  the  real  problems  in  Army  Test  and  Evaluation 
but  that  exposure  lacks  the  depths  that  come  from  the  healthy  experiences  of  being  forced  to 
deal  with  specific  examples  from  beginning  to  end. 
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STATISTICIANS  IN  DEFENSE 


A  defense  department  operational  test  of  a  system  is  typically  expensive  and  involves  the 
use  of  talented  physicists  and  engineers  to  devise  and  install  appropriate  sensors.  It  is  a 
common  saying  among  physicists  that  if  an  experiment  needs  statistical  analysis,  it  is  the 
wrong  experiment.  That  saying  is  based  on  a  historical  luxury  in  science,  where  the  major 
cost  of  experimentation  was  that  of  setting  up  the  experiment.  After  the  experiment  is  set 
up,  it  is  relatively  costless  to  replicate  it  as  many  times  as  necessary  to  get  a  desired  level  of 
accuracy.  That  luxury  is  no  longer  as  available  as  it  used  to  be,  and  it  certainly  is  not  available 
in  operational  testing.  But  a  consequence  of  the  attitude  described  above  is  that  most  scientists 
are  statistically  naive  and  unaffected  by  most  of  the  twentieth  century  revolutions  in  statistical 
theory  and  practice.  The  advances  in  experimental  design,  sequential  analysis  and  decision 
theory,  among  many  others,  are  not  appreciated  by  many  of  the  decision  makers  in  operational 
testing. 

If  we  examine  the  types  of  statistical  issues  that  arise  and  the  personnel  available  to  deal 
with  these  problem,  there  seems  to  be  a  mismatch.  Rather  few  of  the  people  who  are  responsible 
for  facing  such  issues  have  more  than  a  trivial  background  in  statistics.  Under  proper  guidance, 
they  can  be  trained  to  deal  with  a  variety  of  standard  problems.  However  issues  of  experimental 
design  abound,  and  there  are  very  few  people  with  enough  talent  to  absorb  the  results  of  a  three 
day  workshop  on  that  topic  and  apply  them  creatively.  Some  healthy  and  sustained  exposure 
to  the  theory  and  practice  of  statistics  is  almost  always  necessary  to  be  successful. 

Finally  the  real  world  involves  unusual  and  unexpected  variations  of  standard  problems.  To 
deal  with  these  problems  requires  the  training  and  talent  to  be  able  to  recognize  which  rules 
could  and  should  be  broken  and  how  to  adapt. 

In  summary  the  defense  department  would  profit  from  employing  more  well  trained  and 
capable  statisticians.  Statistical  laymen  with  the  benefit  of  a  handbook  or  two  and  a  couple  of 
three  day  workshops  will  rarely  be  able  to  do  the  job  without  experienced  backup. 

EXPERIMENTAL  DESIGN  IN  OPERATIONAL  TESTING  UNDER  LIMITED 

EXPERIMENTATION 


INTRODUCTION 

How  should  one  treat  the  problem  of  testing  a  type  of  equipment  in  the  field  when  the 
equipment  is  expected  to  be  used  in  several  of  a  large  variety  of  potential  environments  and 
funds  are  only  available  to  test  under  very  few  environments?  In  the  following  I  describe  an 
approach  to  this  problem  which,  unfortunately,  fails  to  deal  with  one  of  the  major  functions 
of  operational  testing.  That  function  is  that  of  discovering  the  surprises  that  quickly  locate 
unanticipated  but  glaring  weaknesses,  the  removal  of  which  makes  for  an  improved  product. 
The  approach  is  described  through  an  example.  While  the  example  is  artificial,  I  believe  that 
it  is  sufficiently  realistic  to  permit  discussion  of  the  important  ideas.  After  presenting  the 
^‘results,”  I  will  review  the  various  steps  to  indicate  issues  and  alternatives.  In  the  end  this 
presentation  can  serve  as  a  basis  for  soliciting  a  slightly  more  realistic  problem  on  which  the 
issues  can  be  examined  with  more  care. 

Finally,  this  problem  has  ramifications  in  a  wide  range  of  applications.  For  example,  in 
testing  software,  one  may  subject  the  software  to  thousands  of  test  scenarios.  Nevertheless,  the 
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set  of  possible  applications  is  enormously  larger  than  what  we  may  be  able  to  apply  in  a  test 
with  limited  time. 

THE  EXAMPLE 

The  example  is  an  electric  generator  which  may  be  required  to  function  in  many  environ¬ 
ments.  We  shall  list  8  possible  environments  and  evaluate  these  by  using  numerical  values 
between  0  and  10  for  each  of  18  stress  variables.  High  values  indicate  large  perceived  stress. 
Thus  our  stress  matrix  is  an  18*8  matrix  A  =  ||a,yl|  listed  in  Table  1  with  a  description  of 
the  rows  (variables)  and  columns  (environments)  in  Table  2.  Using  the  measure  of  distance  or 
dissimilarity  where  djj>  is  the  distance  between  the  j  and  j',  columns,  i.e. 


we  have  D  =  l|djj'||  in  Table  3. 

The  Splus  routine  cmdscale(£>,  fc  =  2,  eig  =  F,  add  =  F)  applies  a  mapping  of  the  8  environ¬ 
ments  onto  a  two  dimensional  plane  based  on  the  distances.  The  result  is  a  matrix  Y  =  llj/ijll 
where  i  represents  the  environment  and  j  the  coordinate  in  2  dimensions.  This  matrix  Y  is 
presented  in  the  first  two  columns  of  Table  4.  The  last  3  columns  represent  a  preliminary  weight 
wi  indicating  the  importance  of  success  in  this  environment  and  pr  which  is  proportional  to 
the  prior  probability  effacing  this  environment,  and  their  product  which  will  be  referred  to  as 
the  weight  w.  The  eight  points  are  plotted  in  Figure  1  and  circled. 

On  the  assumption  that  only  two  tests  will  be  permitted  we  select  two  points  xi  =  (xn,  x^) 
and  X2  =  {X2i,  X22)  so  as  to  optimize  a  criterion.  The  criterion  we  use  here  to  be  maximized  is 

V  =  mjn  [I{xi,j)  -i-  /(a:2,i)]| 

where  I{x,j)  is  the  information  that  an  experiment  at  x  contributes  to  the  j-th  environment. 
For  this  discussion  let  us  assume  that 

/(a;,i)  =  exp{-6||a:-t/jl|} 

where  pj  is  the  location  in  the  two  dimensional  space  of  the  point  corresponding  to  the  j-th 
environment.  The  optimizing  points  ®i  and  ®2  and  the  corresponding  value  of  V  depend  on 
the  value  of  b.  Table  5  represents  the  dependence  on  b.  These  points  are  connected  in  Figure 

1. 

ISSUES  AND  ALTERNATIVES 

This  approach  is  painfully  lacking  in  adequate  justification.  The  main  reason  for  not  dis¬ 
missing  it  out  of  hand  is  that  the  underlying  problem  is  real  and  demands  some  resolution.  In 
this  section  we  will  review  the  example  step  by  step,  consider  the  issues  raised  and  alternatives 
to  the  methods  proposed. 

The  first  step  was  the  construction  of  a  stress  matrix  A.  Here  we  have  ignored  one  of  the 
major  contributions  of  operational  testing  (OT).  That  consists  of  the  illuminating  surprises 
that  accompany  OT.  Frankly,  I  don’t  see  how  to  incorporate  that  aspect  in  this  “model.”  To 
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construct  A  we  have  to  employ  enough  expertise  to  imagine  the  various  aspects  or  variables  of 
the  many  environments  that  might  impact  on  the  quality  of  performance.  It  is  necessary  to 
quantify,  in  some  orderly  fashion,  the  perceived  threat  to  satisfactory  performance  embodied  in 
each  of  these  variables.  Such  a  quantification  will  almost  necessarily  be  partly  subjective  and 
should  depend  at  least  in  part  on  the  results  of  developmental  testing  (DT).  Note  that  in  this 
example  two  of  the  variables  were  hot  and  cold.  It  might  seem  strange  to  list  these  as  separate 
variables,  but  the  stresses  imposed  by  extremes  of  heat  can  be  regarded  as  distinct  in  nature 
from  those  imposed  by  extremes  of  cold,  or  for  that  matter,  of  extreme  shifts  from  cold  to  hot. 

Implicit  in  the  quantification  of  stress  is  that  A  can  be  used  to  generate  a  measure  of 
distance  or  dissimilarity  between  pairs  of  environments.  The  measure  of  distance  used  here,  to 
generate  the  matrix  D,  is  naive.  It  might  be  that  the  expert  could  bypass  A  and  go  directly  to 
D.  Otherwise  he  might  find  some  reasonable  alternative  to  our  definition  of  D.  Implicitly,  the 
definition  used  here  weights  each  variable  as  heavily  as  every  other  and  constructs  a  Euclidean 
type  of  distance.  If  some  of  our  stress  variables  were  highly  correlated  because  they  tend  to 
measure  the  same  underlying  factor,  our  measure  D  could  effectively  give  this  factor  more 
impact  than  other  equally  important  factors.  That  phenomenon  can  be  compensated  for,  if 
it  is  understood,  by  replacing  the  squared  distance  by  some  other  quadratic  form  or  by  some 
other  metric  or  measure  altogether. 

With  our  measure  of  dissimilarity,  we  are  eflfectively  measuring  distances  of  points  in  an  18 
dimensional  Euclidean  space.  Each  environment  is  represented  by  one  of  these  points.  Other 
measures  of  dissimilarity  may  not  be  able  to  be  mapped  into  points  in  such  a  space.  In  any 
case,  it  is  difficult  to  comprehend  any  analysis  involving  such  high  dimensionality.  There  are 
a  number  of  techniques  that  have  appeared  in  the  statistical  literature  that  were  developed  to 
cope  with  representing  high  dimensional  phenomena  in  terms  of  a  low  dimensional  Euclidean 
space.  These  methods  go  under  various  names  and  are  considered  to  be  variations  of  ‘Tactor 
Analysis.” 

One  of  the  earliest  such  methods  is  called  Principle  Components.  This  technique  effec¬ 
tively  projects  a  set  of  points  in  n  dimensions  onto  the  closest  k  <  n  dimensional  space.  The 
meaningfulness  of  the  result  depends  on  the  relevance  of  Euclidean  distance  in  the  original 
n-dimensional  space.  The  classical  methods  of  Factor  Analysis  involve  an  assumption  that  the 
data  are  noisy  observations,  the  means  of  which  are  linear  functions  of  k  underlying  factors. 
The  noise  on  each  observed  data  point  is  assumed  to  be  independent  of  the  noise  on  the  others. 
Then  there  are  a  set  of  methods  of  ^^scale  analysis”  which  tries  to  map  a  dissimilarity  matrix 
onto  a  low  dimensional  Euclidean  space  so  that  the  distances  between  points  in  the  Euclidean 
space  are  close  to  the  dissimilarities. 

In  practice,  these  methods  involve  getting  results  for  low  dimensions,  and  comparing  how 
well  they  work  for  low  values  of  k  with  the  next  higher  value.  In  our  example,  I  applied  cmdscale 
which  is  a  scale  analysis  method  in  Spins  for  k  —  2  without  taking  the  time  to  see  if  using 
k  =  3  would  be  much  of  an  improvement  as  measured  by  a  “stress”  criterion.  Once  the  points 
are  mapped  into  a  low  dimensional  space,  the  analyst  often  tries  to  label  certain  directions  in 
the  k  dimensional  space  as  measuring  certain  underlying  factors.  The  classical  factor  analysis 
methods  come  with  a  variety  of  techniques  for  rotating  and  labeling  important  factors.  I  have 
tended  to  be  skeptical  of  these  techniques,  but  in  applied  fields  like  psychology,  the  naming  of 
these  factors  may  be  valuable  and  contribute  to  insight. 
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My  general  attitude  toward  all  of  these  approaches,  is  that  they  serve  a  useful  purpose  in 
suggesting  insights  that  may  well  be  worth  while  pursuing  systematically  in  other  ways.  These 
are  scattergun  techniques  which  may  hit  an  interesting  target,  but  are  not  guaranteed  to  work, 
nor  to  give  meaningful  results  when  results  appear. 

The  ability  to  label  certain  directions  with  interpretations  that  make  sense  to  the  user 
would  be  of  great  importance  for  the  analyst  who  has  to  communicate  with  a  decision  maker 
who  is  necessarily  reluctant  to  decide  on  the  basis  of  the  output  of  a  black  box  or  mysterious 
algorithm. 

Returning  to  our  example,  plotting  the  points  on  a  plane  (2-dimensional  space)  in  Figure 
1,  we  have  labeled  the  points  by  the  environment  number  and  the  weight.  Moving  diagonally 
from  the  upper  left  hand,  one  seems  roughly  to  be  moving  from  a  temperate  to  an  intemperate 
environment.  Moving  from  left  to  right  seems  to  be  going  from  a  humid  to  a  dry  environment. 
These  characteristics  are  not  nearly  a  complete  description  of  the  environments,  and  it  is  of 
value  to  keep  the  labels  handy  to  remind  one  of  the  actual  environment. 

We  would  hope  that  the  lower  dimensional  representation  would  be  helpful  in  describing 
how  much  information  an  experiment  in  one  environment  gives  to  the  user  who  is  interested 
in  another  environment.  One  possibility  is  that  an  expert  can  be  asked  how  much  information 
can  be  obtained  from  an  experiment  in  Saudi  Arabia  for  use  in  a  temperate  urban  environment 
and  vice  versa.  With  introspection  one  could  conceivably  construct  an  information  matrix 
I(x,y)  representing  the  information  from  an  experiment  at  x  for  use  at  y.  This  matrix  need 
not  be  square.  We  could  have  more  or  fewer  values  of  x  than  of  y.  Presumably  the  closer  x 
is  to  y  the  greater  the  value  of  I{x,y).  Actually  that  need  not  be  the  case,  when  we  consider 
the  potential  advantages  of  accelerated  stress  testing.  For  the  time  being  let  us  defer  that 
issue.  In  this  example,  we  have  assumed  that  I{x,y)  is  a  function  of  the  distance  from  the 
representations  of  x  and  y  in  the  two  dimensional  space  on  which  the  environments  have  been 
mapped.  We  have  assumed  that  J  is  a  decreasing  function  of  the  distance,  and  in  particular 
that  it  can  be  represented  by  exp(-6||a5  -  y\\),  where  6  is  a  parameter  to  be  selected.  That 
choice  was  pretty  arbitrary,  and  it  would  make  sense  instead  to  ask  experts  their  assessment  of 
I  and  use  that  to  fit  some  reasonable  function. 

The  next  step  was  to  construct  a  criterion  of  what  would  be  a  good  design.  Here  the  word 
“design”  is  used  to  represent  a  choice  of  several  experimental  environments  x.  For  the  sake  of 
the  example,  I  decided  to  use  two  experimental  environments.  This  choice  of  2  is  not  necessarily 
limited  to  be  the  same  as  the  dimensionality  of  the  space  on  which  the  environments  were 
mapped.  I  also  weighted  each  x  equally  in  calculating  a  cumulated  information  by  summing 
the  informations  at  a  given  point  y.  In  practice,  we  may  decide  to  spend  more  assets  or  money 
on  one  test  than  another.  In  that  case  we  would  not  simply  sum  /(«i,y)  and  I{x2,y)  where  xi 
and  X2  are  the  two  environments  used  for  testing.  We  could  use  a  weighted  sum  which  would 
take  into  account  how  many  assets  were  used  as  well  as  the  environment.  Thus  the  restrictions 
to  treating  two  test  locations  and  weighting  them  equally  are  not  essential  and  could  easily  be 
modified.  To  return  to  the  criterion,  I  selected  that  of  optimizing  the  worst  that  could  happen 
where  the  worst  is  defined  as  the  minimum  over  all  possible  environments  y,  of  the  cumulated 
information  at  y  divided  by  the  weight  at  y.  Thus  a  low  value  of  information  at  an  important 
environment  would  be  much  worse  than  a  low  value  at  the  North  Pole  which  I  assumed  to  be 
of  little  military  importance  for  this  example. 
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For  my  example  I  gave  positive  weights  only  to  the  8  environments  used  for  the  mapping. 
That  need  not  be  the  case.  I  also  considered  the  possibility  of  creating  a  test  in  any  environment 
in  the  two  dimensional  space.  That  may  not  be  feasible.  We  might  be  more  limited.  It  may 
be  difficult  to  construct  an  environment  that  would  be  mapped  into  a  given  point  in  the 
two  dimensional  space.  Possibly  more  embarrassing,  there  may  be  too  many  different  real 
environments  that  would  correspond  to  a  given  point  in  the  two  dimensional  space. 

Table  5  shows  the  impact  of  changing  the  parameter  6.  When  b  is  small,  the  impact  of 
distance  is  slight,  and  it  is  important  to  make  sure  that  the  important  or  highly  weighted 
points  get  maximum  information.  Then  the  optimal  design  puts  the  two  x  values  at  the  two 
most  highly  weighted  environments.  When  b  is  large,  information  is  greatly  diminished  with 
distance  from  the  testing  point.  Then  it  is  necessary  to  move  the  x  values  to  some  compromise 
positions  which  do  not  downgrade  too  much  the  performance  at  less  important  places. 

While  this  behavior  may  seem  sensible,  it  depends  heavily  on  buying  into  the  criterion 
proposed.  Both  the  exponential  decline  and  the  maximin  aspects  should  be  questioned  and 
alternatives  considered.  This  even  ignores  the  possibility  of  replacing  the  information  I{x^y) 
by  a  higher  dimensional  measure  such  as  an  information  matrix.  At  this  stage  of  sophistication, 
it  seems  premature  to  consider  this  latter  extension. 

The  issue  of  accelerated  testing  has  not  been  addressed  here  yet.  If  it  were,  it  might  be  that 
information  would  not  decrease  as  one  moves  x  from  y.  One  possibility  is  to  add  a  dimension 
for  stress.  However,  the  North  Pole  and  Saudi  Arabia  represent  high  stress  environments  at 
least  with  respect  to  certain  variables,  and  one  might  be  able  to  incorporate  these  high  stresses 
without  going  to  another  dimension. 

HOW  LARGE  SHOULD  THE  SAMPLE  BE? 

We  consider  two  examples,  the  first  describes  some  of  the  consequences  of  selecting  a  given 
sample  size  in  establishing  a  rather  modest  80%  confidence  statement  on  the  unknown  reliability 
of  a  system.  The  second  deals  with  finding  the  appropriate  sample  size  for  testing  whether  the 
mean  of  a  normal  distribution  exceeds  a  desired  threshold  value.  A  special  characteristic  of 
the  latter  problem  is  that  the  sample  size  must  be  selected  in  advance  of  experimentation.  A 
subsequent  criticism  suggests  the  potential  benefit  of  a  multiple  or  sequential  sampling  plan. 

CONFIDENCE  BOUNDS  ON  RELIABILITY  FOR  A  GIVEN  SAMPLE  SIZE 

Some  of  the  underlying  problems  with  deciding  sample  size  to  determine  reliability  can 
be  understood  in  terms  of  the  following  table  dedicated  to  providing  80%  confidence  that  a 
given  reliability  is  at  least  80%.  Suppose  that  n  items  are  tested  in  a  success  failure  mode  to 
determine  whether  there  is  80%  confidence  that  it  has  reliability  at  least  80%.  Let  s  be  the 
number  of  successes  required  to  pass  the  test. 

First  we  note  that  to  pass  the  test,  n  must  be  at  least  8.  Let  r  be  the  actual  reliability 
required  so  that  we  will  be  80%  sure  of  passing  this  test.  Also  let  q  be  the  reliability  below 
which  the  probability  of  passing  the  test  is  less  than  0.10.  Finally  let  t  be  the  probability  of 
passing  the  test  when  the  reliability  is  exactly  80%. 

The  irregularities  in  the  trends  in  Table  6  derive  from  the  discrete  nature  of  the  binomial 
distribution. 
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In  any  case  a  very  high  reliability  r  is  required  to  be  moderately  sure  of  passing  the  test  for 
n  <  50.  Also  the  probability  of  passing  the  test  decreases  rapidly  as  the  reliability  decreases. 

A  TESTING  PROBLEM 

While  operational  testing  is  not  often  presented  as  a  pass  fail  test  of  a  system,  the  problem 
we  pose  here  is  exactly  that.  Although  it  is  phrased  in  terms  of  normally  distributed  observa¬ 
tions,  the  illustrative  example  will  involve  a  binomial  problem.  The  same  normal  problem  has 
been  solved  by  Grundy  et  al.^  in  a  slightly  different  context.  It  was  also  discussed  by  Raiffa 
and  Schlaifer^.  The  presentation  of  the  results  here  differ  from  those  in  the  other  publications, 
and  fits  in  better  with  a  sequential  version  of  this  problem,  a  major  topic  of  Chernoff^. 

Let  Xi,X2,..-,Xn  be  independent  identically  distributed  normal  random  variables  with 
unknown  mean  fi  and  known  variance  cr^.  The  unknown  parameter  //  has  the  prior  normal 
distribution  N{no^al)  with  known  mean  {Xq  and  known  variance  It  is  desired  to  decide 
whether  /i  >  0  or  //  <  0,  and  the  cost  of  an  incorrect  decision  is  where  k  is  known.  The 
cost  of  n  observations  is  cn  where  c  is  known.  How  large  should  n  be? 

POSTERIOR  DISTRIBUTION  AND  RISK 

The  posterior  distribution  of  given  the  data  X\,X2,  Xn  is  N  (T,  s)  where 

Y  =  ^  +  na-^X)l{aQ^  -|-  na-^)  (1) 

is  the  Bayes  posterior  estimate  of  the  unknown  mean  p,  X  is  the  average  of  the  n  observations, 
and 

^  (2) 

is  the  precision  of  the  posterior  estimate  Y  of  p.  Note  that  T  is  a  weighted  average  of  the  prior 
mean  po  and  the  average  X  weighted  by  their  precisions  o-q  ^  and  In  a  sense  the  prior 

distribution  corresponds  to  information  gathered  from 

no  = 


observations  averaging /io- 

The  appropriate  decision  rule  for  this  symmetric  problem  is  to  decide  >  0  if  and  only  if 
y  >  0.  The  posterior  risk  associated  with  this  procedure  is 

r„(y)  =  /  -^==e~^^^~^^^k\p\dp  +  cn 

*1—00  27r5 

if  y  >  0.  In  general,  r„(y)  is  an  even  function,  and 

r„(y)  =  ks^l'^p{a)  +  cn  (3) 


where 


a  =  YlVs 

is  the  number  of  standard  deviations  of  the  posterior  estimate  Y  from  0, 

p{a)  =  <j>{a)  -  |a|{l  -  $(|q:1)} 
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and  (j)  and  $  are  the  standard  normal  density  and  cumnlative  distribution  functions,  i.e., 

(j){a)  =  (27r)“^/^exp(— a^/2) 


and 


$(a)  =  f  (l)(v)dv. 

J — oo 

Before  the  data  are  observed  to  yield  a  posterior  risk,  the  Bayes  risk  is  the  expectation 

/CO 

+  Wsq  -  s)<f>(€)d€  (4) 

-OO 

where  sq  =  aQ,  Then,  it  can  be  shown  that 

Rn  =  -  ca^jol  (5) 

where  oq  =  Mo/\/5o  ,  =  pLol^/^  and  =  sq  -  s. 

NORMALIZATION 

Here  Rn  depends  on  the  known  constants  /zq,  ctq,  cr^,  k  and  c.  It  is  obvious  that  the  number 
of  effective  parameters  for  describing  the  optimal  choice  of  n  can  be  reduced.  For  example,  cjk 
is,  except  for  a  trivial  normalization,  more  relevant  than  c  and  k  separately. 

A  valuable  normalization  reducing  the  number  of  effective  parameters  is  obtained  as  follows. 
Let  Xi  =  aXi.  Then  jl  =  a/z, 5-^  =  a^(7^,/zo  =  n/zo^d-Q  =  a^cjQ^Y  ^  aY^s  =  a?s^So  =  a?SQ^Si  = 
a^.Si,a  =  a,ao  =  o>o  and  &i  =  Oi.  Next 

Rn  =  ka~^ {sl/'^ p(ao)  -  iJ'^V(di)}  +  -  ca’^la^, 

and  selecting  a  to  make  ka^^  =  ,  i.e., 

we  have 

+  ccr^/cTo]  =  p{ao)  -  s}^^p(d:i)  + 

Thus  the  choice  of  the  optimal  sample  size  is  essentially  that  of  minimizing 

R  =  5^ V«o)  -  5y^p(di)  +  (6) 

with  respect  to  s.  We  recall  that  sq  and  do  =  PqIq  are  fixed,  and  si  =  5o— and  di  = 
depend  on  s.  Setting  dR/ds  equal  to  0  yields 

-172<^(di)  =  5"^  (7) 

from  which  we  can  derive  level  lines  for  the  optimal 

n  =  —  Sq^.  (8) 
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(9) 


This  result  then  easily  gives  the  optimal 

n  =  -  (Tq  =  (crklcf^^n 

as  a  multiple  of  n  for  each  value  of  (/io,  ^o)  or,  equivalently,  of  (to,  ao)  where 

to  =  5o  ^  =  ^0  ^  =  i(^k/c)-^^\cr^/(rl)  (10) 


and 


do  =  =  Mo/^o 


Because  of  the  normalization,  n  assumes  fractional  values.  The  discrete  nature  of  our 
original  problem  implies  that  the  appropriate  value  of  n  is  typically  a  nearby  integer  value  of 
n  >  0,  unless  n  =  0  in  which  case  n  =  0. 

RESULTS 

The  results  of  the  calculation  of  the  optimal  choice  of  h  are  presented  graphically  in  Figure 
2  which  shows  the  curves  in  the  (fo,Q:o)  plane  for  which  n  takes  on  the  values  .02,  .04,  .06, 
.08,  1.0,  1.25,  1.50  and  1.75.  Because  of  symmetry.  Figure  2  is  given  only  for  oq  >  0.  Some 
consequences  of  these  results  are  worth  emphasizing.  It  does  not  pay  to  sample  if  to  is  too 
large.  If  to  is  too  small  then  a  minimal  amount  of  sampling,  i.e.,  n  =  1  is  required.  Given  that 
to  =  is  a  measure  of  the  prior  precision  or  information,  it  is  not  surprising  that  large  values 
of  to  should  discourage  sampling,  especially  if  Ho  is  large.  However,  it  may  seem  paradoxical 
that  we  should  not  wish  to  sample  much  when  we  are  almost  completely  ignorant,  a  priori, 
about  p. 

The  explanation,  from  a  Bayesian  perspective  is  that  when  to  is  small,  p  is  very  unlikely 
to  be  moderate  in  size.  Thus  one  observation  will  be  enough  to  determine  whether  p  is  highly 
positive  or  highly  negative.  Even  if  |io  =  9,  si  single  observation  should  suffice  as  long  as 
we  have  prior  reason  to  believe  that  \p\  is  large  compared  to  cr. 

The  value  of  n  can  never  exceed  1.8064  which  it  attains  when  ao  =  0  and  to  =  0.0904. 

There  is  a  boundary  of  values  (to,ao)  for  which  n  =  0.  On  this  boundary,  there  is  a  sample 
size  n  >  0  which  gives  the  same  Bayes  risk  as  n  =  0.  The  underlying  reason  for  this  bifurcation 
effect  is  that  the  risk,  as  a  function  of  increasing  n,  as  to  and  ao  are  kept  fixed,  first  increases 
ajid  then  decreases  to  a  minimum  value  before  going  to  oo.  When  the  minimum  value  of  the 
risk  is  below  that  for  fi  =  0,  it  pays  to  sample.  As  we  fix  ao  and  change  to,  the  location  of 
the  minimizing  value  of  n  and  R  change  gradually  and  the  minimizing  value  of  n  approaches  a 
positive  limit  as  the  minimum  value  R  approaches  the  risk  corresponding  to  fi  0. 

BINOMIAL  ILLUSTRATION 

I  propose  to  illustrate  the  solution  with  an  artificial  binomial  example  involving  a  missile 
which  either  succeeds  or  fails.  In  principle,  a  binomial  problem  can  be  solved  directly  and  our 
normal  approximation  to  that  problem  is  unnecessary.  However  it  serves  a  useful  illustrative 
purpose. 

Even  in  a  relatively  simple  binomial  problem  where  the  prior  distribution  of  the  unknown 
probability  of  success,  p,  has  a  beta  distribution,  it  isn’t  possible  to  find  a  normalization  that 
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reduces  the  problem  to  a  graph  comparable  to  Figure  2.  Figure  2  is  useful  for  quick  and 
dirty  answers,  and  overall  insight.  Of  course,  precise  results  for  specific  problems  deserve  more 
detailed  analysis,  especially  when  large  amounts  of  money  are  involved.  If  small  sample  sizes 
are  called  for,  then  the  normal  approximation  may  be  unreliable. 

Example.  It  is  proposed  to  design  a  new  missile  to  upgrade  the  reliability  from  the  current 
value  of  70%  to  85%  at  a  cost  of  10  billion  dollars  to  produce  1000  missiles.  If  the  new  missile 
achieves  a  reliability  p  of  only  80%,  the  effort  will  have  seemed  barely  worthwhile.  Thus  we 
estimate  k  =  1  representing  the  cost  in  billions  of  dollars  per  percentage  deviation  from  80. 
The  cost  per  missile  tested  is  10  million  dollars  or  c  =  .01  in  billions  of  dollars.  Assuming  that 
lOOp  =  p  +  80,  the  observations  X  =  100  on  success  and  0  on  failure  have  standard  deviation 
a  =  100^.2  >^78  =  40.  The  engineers  feel  that  they  will  reach  85%  reliability  and  the  prior  on 
fi  has  mean  po  =  5  and  standard  deviation  (Tq  =  8  (equivalent  to  25  observations). 

Then  a  =  (kfca'^Y^^  =  0.39685,  to  =  {crk j j Oq)  =  0.09921  and  ao  =  = 

0.625. 

The  corresponding  value  of  h  is  0.1431  and  the  optimal  n  =  (j^a?h  =  36.06  which  can  be 
rounded  off  to  36  costing  360  million  dollars  for  testing  missiles,  not  counting  the  set  up  cost. 

RATIONALE 

Two  aspects  of  the  problem  require  some  rationalization.  These  are  the  k\p\  cost  for  wrong 
decision  and  the  use  of  the  normal  distributions.  Incidentally,  the  linear  cost  of  sampling 
makes  sense  even  if  there  is  a  set  up  cost,  providing  the  cost  of  additional  observations  are 
approximately  fixed  per  observation. 

Suppose  that  the  appropriate  decision  depends  on  the  size  of  some  unknown  parameter 
p.  We  wish  to  make  one  decision  if  the  parameter  is  large  and  an  alternative  decision  if  it 
is  small.  Generally  that  means  that  the  loss  (or  payoff)  for  each  decision  is  some  function  of 
the  parameter.  These  two  functions  intersect  at  some  break  even  point  /x*  of  p.  If  these  two 
functions  are  differentiable  at  //*,  the  difference  between  the  two  functions  is  approximately 
A:(/i  —  p’^)  for  /i  near  p*  where  k  is  the  derivative  at  p*  of  the  difference  between  these  two 
functions.  The  loss  for  taking  the  wrong  action  is  then  approximately  \k{p-/x*)\.  By  translating 
the  parameter  from  /x  to  /z  -  /x*,  we  have  a  break  even  point  at  0  and  a  loss  of  k\fji\  where  k  is 
positive. 

The  illustrative  example  involving  the  binomial  shows  how  we  can  approximate  other  prob¬ 
lems  by  the  normal  problem  when  we  can  rely  on  the  central  limit  theorem.  If  exact  results  are 
called  for,  then  this  normal  problem  should  be  regarded  as  a  convenient  means  of  obtaining  a 
rough  approximation. 

SUMMARY 

The  nature  of  the  solution  which  calls  for  minimal  sampling  when  little  is  known  about  /i 
was  partly  explained  in  the  section  on  results.  If  indeed,  we  are  almost  certain  |/z]  is  large, 
then  that  solution  is  appropriate.  That  resolution  is  unsatisfying  to  a  frequentist  who  would 
like  a  more  robust  solution  which  would  be  likely  to  lead  to  a  low  expected  loss  no  matter 
what  the  value  of  /i  is.  The  frequentist  might  prefer  a  minimax  procedure  which  would  call  for 
n  —  0.1933(afc/c)^/^  observations.  In  that  case  the  maximal  risk  is  attained  at  =  0.7518a/ y/n. 
In  the  binomial  illustration,  this  approximation  would  have  n  =  48.7. 
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Both  the  Bayesian  and  the  frequentist  are  inclined  to  dislike  the  severe  restraint  that  the 
sample  size  must  be  determined  in  advance  before  any  testing  is  carried  out.  The  potential 
advantages  of  sequential  sampling,  or  at  least  double  or  triple  sampling,  should  not  be  neglected, 
especially  in  those  cases  where  the  prior  distribution  is  very  vague.  The  relaxation  of  the  above 
restraint  will  help  considerably  to  ameliorate  the  difficulty. 

In  case  we  wish  to  consider  further  sampling,  after  n  observations  have  been  made.  Figure 
2  is  still  of  value.  The  data  have  served  to  convert  our  prior  Oq  to  a  posterior  a  =  Y/y/s  and 
to  to  i  =  s-'^  =  io  +  n.  Replacing  {to,  ao)  by  {i,  a)  we  may  use  the  figure  to  decide  how  large 
an  additional  sample  size  should  be,  if  only  one  more  such  choice  is  allowed.  While  Figure  2 
is  useful  in  telling  us  how  large  the  second  sample  should  be  in  a  two  sample  case  it  does  not 
help  to  tell  us  how  large  the  first  sample  should  be  in  a  two  sample  study. 

In  the  two  sample  case  we  should  expect  the  first  sample  size  to  be  relatively  small  compared 
to  n  when  n  >  0.  However  the  option  of  taking  a  small  first  sample  should  extend  the  (fo,ao) 
range  over  which  we  should  do  some  sampling.  If  we  should  decide  to  proceed  in  a  fully 
sequential  mode,  deciding  after  each  observation  whether  or  not  to  continue  sampling,  then  the 
appropriate  sequential  stopping  boundary  is  given  by  the  dashed  curve  in  Figure  2  to  the  right 
of  the  curves  describing  fi.  In  the  sequential  case  the  proper  labeling  of  the  axes  would  be  i 
and  a.  Sampling  should  continue  as  long  as  (t,  a)  is  to  the  left  of  the  dashed  curve. 

In  summary,  the  normal  problem  has  the  advantage  of  the  normalization  that  makes  the 
two  dimensional  Figure  2  useful.  The  representation  in  terms  of  (f,  a)  is  useful  in  two  ways. 
First  /  measures  the  cumulated  precision  or  information.  Second,  since  a  measures  the  number 
of  standard  deviations  from  0,  it  provides  a  nominal  significance  level.  For  example  a  =  1.5 
corresponds  to  the  nominal  significance  of  $(—1.5)  =  0.067. 
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Table  1  Stresses  in  Various  Environments,  A  =  ||a;j|| 
t/i  1  2  3  4  5  6  7  8 

1981  15646 

2219  85675 

3422  35785 

4918  85675 

5192  25436 

6333  35776 

7  7  3  1  1  5  6  4  6 

8351  15756 

9747  95784 

10  758  10  5785 

11  222  10  2683 

12  855  55543 

13  252  25677 

14  343  35668 

15  833  78639 

16  825  88779 

17  754  45538 

18  747  75464 


Table  2  Stress  Variables  (Rows)  and  Environments  (Columns)  of  Table  1 


Rows 

temperature 

1.  hot 

2.  cold 

3.  variability 

humidity 

4.  dry 

5.  humid 

6.  variabibty 

dust 

7.  particle  size 

8.  standard  dev.  of  part,  si 

9.  windiness 

10.  peaks  of  windiness 


altitude 

11.  altitude 

demand 

12.  heavy 

13.  irregular 

14.  peaks 

fuel 

15.  available 

16.  quality 

service 

17.  parts  available 

18.  quality  of  personnel 


Columns 

1.  Saudi  Arabia 

2.  jungle 

3.  North  Pole 

4.  Himalaya  peak 

5.  temperate  rural  ground  level 

6.  temperate  rural  hilly 

7.  temperate  rural  mountain 

8.  temperate  urban 
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Table  3  Distance  Matrix  D  = 


1 

2 

3 

4 

5 

6 

7 

8 

1 

0.00 

16.12 

14.56 

15.46 

10.39 

12.37 

15.30 

13.42 

2 

16.12 

0.00 

16.67 

20.07 

11.75 

14.18 

16.73 

13.86 

3 

14.56 

16.67 

0.00 

9.95 

12.65 

14.46 

13.04 

16.85 

4 

15.46 

20.07 

9.95 

0.00 

14.39 

14.00 

11.87 

17.46 

5 

10.39 

11.75 

12.65 

14.39 

0.00 

7.00 

10.86 

6.00 

6 

12.37 

14.18 

14.46 

14.00 

7.00 

0.00 

6.40 

8.06 

7 

15.30 

16.73 

13.04 

11.87 

10.86 

6.40 

0.00 

12.65 

8 

13.42 

13.86 

16.85 

17.46 

6.00 

8.06 

12.65 

0.00 

Table  4  Result  of  cmdscale  (D,  fc  =  2,  eig  =  F,  add  =  F)  and  weights 


Y^WvijW 


'■/j 

1 

2 

Wi 

pr 

w 

1 

-0.89 

-2.47 

3 

5 

15 

2 

-7.88 

-6.95 

2 

3 

6 

3 

7.03 

-5.25 

1 

1 

1 

4 

10.06 

-0.08 

1 

1 

1 

5 

-3.22 

0.52 

5 

5 

25 

6 

-1.81 

4.90 

3 

5 

15 

7 

2.80 

5.40 

1 

2 

2 

8 

-6.08 

3.93 

5 

5 

25 

The  rows  represent  the  environments. 

The  first  two  columns  are  the  coordinates  in  the  two-dimensional  space. 

The  next  three  columns  are  the  weights  representing  importance,  the  prior  probabilities  that 
these  environments  will  show  up,  and  the  product  of  those  two.  Note  that  the  prior  probabilities 
are  not  normalized  to  add  to  one.  Also  the  weight  is  labeled  wi  while  the  product  is  labeled  w 
because  it  will  hereafter  be  referred  to  as  the  weight. 


113 


Table  5  Optimizing  Points  and  Values  for  Different  Values  of  b 


b  Xi  X2  V 


Xu 

2^12 

2^21 

2^22 

.00 

-3.22 

0.52 

-6.08 

3.93 

8.0(-2) 

.20 

-2.88 

-0.33 

-5.83 

3.65 

5.l(-2) 

.30 

-2.74 

-2.80 

-4.83 

3.17 

3.0(-2) 

.40 

-0.87 

-3.85 

-4.58 

1.24 

1.3(-2) 

.50 

-0.17 

-0.12 

-6.74 

-0.13 

6.2(-3) 

.60 

0.65 

1.45 

-6.78 

-0.45 

3.3(-3) 

.80 

1.59 

1.42 

-6.84 

-0.67 

1.0(-3) 

1.00 

2.12 

1.11 

-6.42 

-0.88 

3.3(-4) 

1.50 

2.53 

0.86 

-4.90 

-1.29 

l.l(-5) 

3.00 

2.92 

0.63 

-4.79 

-1.14 

-1- 

Table  6 


n 

s 

r 

9 

t 

8 

8 

0.973 

0.750 

0.168 

10 

10 

0.978 

0.795 

0.107 

25 

22 

0.907 

0.752 

0.234 

50 

42 

0.869 

0.754 

0.307 

100 

83 

0.854 

0.772 

0.271 

400 

327 

0.832 

0.790 

0.209 
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Intentionally  left  blank. 
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OVERVIEW  OF  EXPERIMENTATION  AT  FORT  HUNTER  LIGGETT 

(Introduction  to  Special  Session  on  “Forty  Years  of  Experimentation  at  Fort  Hunter  Liggetf") 

Carl  T.  Russell 

Chief  Scientist,  TEXCOM  Experimentation  Center 
Fort  Hunter  Liggett,  California  93928-8000 

TEXCOM  Experimentation  Center  began  its  life  as  the  Combat  Developments  Test  and  Experimentation  Center 
on  1  November  1956  and  first  became  Combat  Development  Experimentation  Center  (CDEC)  on  1  Januaiy  1957. 
Subsequent  history  is  dominated  by  the  name  “CDEC”  as  is  clear  from  the  following  table. 


CDTEC 

Combat  Developments  Test  and  Experimentation  Center 

11/01/56-12/31/56 

CDEC 

Combat  Development  Experimentation  Center 

01/01/57-06/30/62 

CDEC 

Combat  Developments  Experimentation  Center 

07/01/62-04/30/63 

CDCEC 

Combat  Developments  Command  Experimentation  Center 

05/01/63-03/22/65 

CDCEC 

Combat  Developments  Command  Experimentation  Command 

03/23/65-08/31/71 

CDEC 

Combat  Developments  Experimentation  Command 

09/01/71-03/22/83 

CDEA 

Combat  Developments  Experimentation  Activity 

03/23/83-07/01/83 

CDEC 

Combat  Developments  Experimentation  Center 

07/02/83-11/02/88 

TEC 

TRADOC  Test  and  Experimentation  Command,  Experimentation  Center 

11/03/88-11/14/90 

TEC 

Test  and  Experimentation  Command,  Experimentation  Center 

11/15/90-09/30/97 

The  name  has  always  contained  “Experimentation,”  and  until  1988  it  always  started  with  “Combat  Developments.” 
This  is  important  because  CDEC  was  established  expressly  for  experimenting  with  organizational  concepts  as  well 
as  doctrinal  and  materiel  concepts.  As  such  it  had  no  predecessor  and  no  existent  body  of  experimental  meth^— that 
had  to  be  learned  and  developed  from  scratch  during  the  early  days.  From  the  beginning,  expenmentation  at  Fort 
Hunter  Liggett  has  concentrated  on  performing  Real-Time  Casualty  Assessment  (RTCA)  in  force-on-force  trials  to 
make  those  trials  replicate  combat  as  closely  as  possible  and  on  measuring  the  results  of  those  trials  as  aaurately  as 
possible-and  instrumentation  has  always  played  a  prominent  part.  From  the  late  1970’s  onward,  CDEC  s  ivo^oad 
became  more-and-more  oriented  towards  operational  testing,  partly  accounting  for  the  name  change  in  1988.  With 
increased  spending  constraints  in  DoD,  the  Army  has  determined  that  maintaining  an  experimentation  facility  at  Fort 
Hunter  is  no  longer  affordable,  so  the  Command  will  inactivate  effective  30  September  1997. 

Although  they  have  historical  content,  the  talks  in  tlus  Special  Session  are  primarily  about  harvesting  and 
interpreting  data  for  making  important  decisions.  The  talks  by  no  means  fully  document  the  technical  histoiy  of 
CDEC,  but  as  a  varied  series  of  vignettes  they  sketch  CDEC’s  role  in  Army  experimentation  over  the  past  forty 

years. 

Most  technical  support  for  CDEC  has  always  been  in  the  hands  of  a  government  contractor.  In  the  earliest  years, 
this  contractor  worked  directly  for  the  Commanding  General.  Until  1966  when  the  contractor  became  known  as  the 
Scientific  Support  Laboratory  (SSL),  the  contractor  was  know  as  the  “Research  Office.”  Floyd  Hill  was  hired  in 
the  summer  of  1956  to  start  staff  recntitment  and  begin  program  planning  for  CDEC.  By  November  1st  he  had  office 
space  in  Monterey  and  a  staf  of  ten  professionals  to  continue  planning  and  methodology  development  for  the  first 
experiment  at  CDEC  in  March  1957.  When  CDEC  was  made  permanent  in  1958,  Stanford  Rese^h  Institute  won 
the  competition.  Increasing  requirements  for  new  instrumentation  to  meet  CDEC  data  needs  continued,  and  from 
1962  to  1966  much  instrumentation  was  developed  and  fielded  under  the  direction  of  Henry  Alberts  who  headed 
SSL  instrumentation  agtdn  in  1980-81. 
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10000 
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CDEC 
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1000 

Number 
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800 


600 
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(Imear) 

400 


200 


11/1/56 


11/1/66 


11/1/76  11/1/86 

Forty  Years  of  Experimentation 
at  Fort  Hunter  Liggett 


11/1/96 


In  July,  1968,  Walter  Hollis  was  appointed  to  the  newly-created  position  of  “Scientific  Advisor,”  at  CDEC, 
supplementing  technical  advice  to  the  Commanding  General  from  the  contractor  side  with  advice  from  the 
government  side.  During  his  tenure  at  CDEC,  experimentation  transitioned  from  stopwatch  to  computer  as 
instrumentation  and  automation  capabilities  advanced  Marion  Bryson  replaced  Mr.  Hollis  as  Scientific  Advisor  in 
1973,  and  he  became  the  Director  of  CDEC  in  1983,  remaining  until  he  left  to  become  TEXCOM  Technical 
Director  in  1991.  As  you  can  imagine,  much  changed  during  his  tenure  at  CDEC.  Bill  West  came  to  CDEC  in 
1985  as  Chief  Scientist  under  Dr.  Bryson  and  remained  after  Dr.  Bryson  left  as  Chief  Scientist  and  Deputy  Director. 
Carl  Russell  replaced  Bill  West  as  Chief  Scientist  in  1993.  James  Prouty  has  been  the  TEC  Commander  since 
August,  1995,  but  COL  Prouty  spent  substantial  time  at  CDEC  earlier  in  connection  with  TASVAL  and  Apache 
Hellfire  testing. 

The  current  TEXCOM  Technical  Director,  Brian  Barr,  was  assigned  to  CDEC  as  a  Captain  in  1975-78,  and  he 
returned  as  a  Major  in  1979-80.  His  paper  will  address  some  classic  data  from  the  early  1970’s.  In  1979  as  a  new 
IDA  employee  fresh  from  teaching  physics  at  Yale,  Ernest  Seglie  cut  his  DoD  teeth  on  TASVAL,  and  he  has 
returned  to  Fort  Hunter  Liggett  many  times  since,  mostly  in  his  oversight  role  as  Scientific  Advisor  to  the  Director 
of  Operational  Test  and  Evaluation.  He  is  the  only  person  shown  on  this  slide  who  was  never  assigned  to  CDEC, 

Dr.  Seglie  initiated  the  National  Research  Council  project  which  Herman  Chernoff  discussed  this  morning,  and 
his  paper  this  afternoon  will  assess  the  importance  of  high-resolution  RTCA  in  operational  testing.  Ed  Buntz,  the 
current  instrumentation  chief,  came  to  CDEC  as  a  Captain  in  1980,  was  promoted  to  Major  here,  and  like  many 
others  who  ended  their  military  career  at  CDEC,  he  never  left.  Mike  Tedeschi,  chief  of  methodology  under  Mr. 
Buntz,  joined  CDEC  in  1981.  Together,  they  have  led  an  effort  which  not  only  made  RTCA  instrumentation  mobile 
but  also  produced  what  is  arguably  the  Army’s  best  After  Action  Review  capability. 

At  the  final  session  of  the  day,  Mr.  Barr  will  moderate  a  panel  in  which  Mr.  Hollis,  Dr.  Seglie  and  Dr.  Bryson 
discuss  the  future  of  field  experimentation. 
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RECOLLECTIONS  OF  FIRST  YEARS  OF  CDEC  - 
SEPTEMBER  1956  TO  JUNE  1958 

Floyd  1.  Hill 

Associate  Director,  Research  Office  Experimentation  Center 


ABSTRACT 

Because  the  documentation  for  these  years  was  largely  destroyed,  the  paper  is  entitled 
“Recollections.”  Mr.  Hill  was  hired  because  of  his  experience  in  designing  and  directing  the  first 
Operational  Test,  Project  STAUC,  conducted  at  Fort  Irwin  August-December  1953.  This  test  is 
briefly  outlined,  and  its  influence  on  the  first  CDEC  experiments  frequently  noted.  The  first  major 
supported  company-sized  organizational  experiment  using  the  two-sided  operational  game  as  a 
model  is  described  in  terms  of  instrumentation  and  umpire  procedures.  Some  findings  are  given. 
The  reasons  underlying  the  breakdown  of  the  contractual  relationship  between  General  Fred  Gibb, 
Commanding  General  of  CDEC  and  the  Research  Office  are  given.  Some  of  the  subsequent  tests, 
including  the  Scouting  Experiment  and  the  Helicopter  Experiment,  are  discussed.  At  no  place  did 
the  results  of  these  operational  tests  (including  STALK)  conform  to  existing  policy.  They  were, 
therefore,  rejected. 


OPENING 

Thank  you  for  coming.  My  special  thanks  are  given  to  Carl  Russell,  who,  in  his  invitation  call, 
speculated  from  his  conversations  with  me  over  the  years  that  Project  STALK  was  the  first  CDEC 
experiment.  Note  that,  due  to  document  destruction,  the  available  record  of  those  times  is  at  best 
fragmentary  and  often  sometimes  scrambled.  These  are  my  recollections,  which  are,  no  doubt, 
selective  in  nature. 


COMMAND  IMPLEMENTATION 

The  CG  of  TRADOC  (which  included  the  Combat  Development  Command  [CDC]  at  that  time). 
General  Willard  Wyman  had  assigned  34  senior  staff  offices  to  judge  Reorganized  Current  Infantry 
Division  (ROCID)  field  exercises  in  the  Spring  of  1956.  The  disturbing  result  was  that  17  of  the  34, 
judged  ROCID  to  be  very  good  and  17  judged  it  to  be  very  bad.  General  Wyman  decided  that  he 
needed  objective  information.  So,  with  no  appropriated  R&D  funds,  he  directed  that  funding  be 
from  O&M  funds;  troop  support  and  CDEC  staff  housing  be  provided  by  the  7th  Infantry  Division 
at  Fort  Ord;  training  and  testing  areas  be  at  Camp  Roberts  and  Hunter-Liggett  Military  Reservation 
(HLMR);  and  the  TRADOC  resident  scientific  support  contractor.  Technical  Operations  Inc.  (TOI) 
staff  and  house  a  20-man  scientific  support  group  to  be  “Headed  by  a  Ph.D.”  fc  September  1956 1 
was  hired  by  TOPs  President,  Dr.  Fred  Henriques,  to  be  Associate  Director  of  the  planned  Research 
Office  of  the  Experimentation  Center  (ROEC),  and  I  was  given  the  task  of  building  the  staff.  I  was 
the  only  person  on  the  ROEC  staff. 
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WHY  I  WAS  HmED  (PROJECT  STALK) 


Dr.  Henriques  hired  me  because  I  had  been  Technical  Director  of  Project  STALK,  a  joint 
Ballistic  Research  Laboratories/CDC  comparative  test  run  at  Camp  Irwin,  California,  of  1 1  different 
tank/fire  control  systems  supplied  on  5  tanks,  the  Baseline  M4A3E8,  the  (Light)  T41,  and  the 
(Medium)  M47,  M47E1  (Stabilized)  and  T48.  The  effectiveness  measure  of  the  test  was:  time  from 
target  acquisition  to  a  hit  on  a  suddenly  appearing  target  to  the  tank/fire  control  candidate  on  5  (6  feet 
by  6  feet)  stationary  targets  placed  within  90°  to  either  side  of  the  tank  axis  of  travel  along  a  trail 
about  2,500  yards  long.  The  five  targets  were  distribution  in  4  500-yard  range  brackets  between  250 
and  2,250  yards.  The  experimental  design  was  an  1 1  x  1 1  Graeco  Latin  Square-treating  the  4  main 
effects  of:  Tank/fire  control  system  combination.  Tank  Crew,  Test  Course,  and  Order  of  Crew 
Testing.  It  was  the  first  Army  Operational  Test  and  the  only  one  (that  I  know  of)  that  measured  the 
effect  of  Player  Uncertainty  of  where  and  when  the  target  would  appear  by  comparing  the  hitting 
performance  of  each  tank/fire  control  system  on  a  single  Training  Test  Course  to  that  on  the  1 1 
Record  Courses,  which  were  traversed  only  once  by  each  tank  crew.  25  tank  crews  were  trained  in 
5  platoons  for  the  first  phase  of  testing  and  rotated,  trained  on,  and  fired  a  different  tank  in  the  next 
phase.  There  were  5  phases  in  all.  13,000  main  gun  rounds  were  fired  in  all.  Firing  at  each  target 
continued  until  a  hit  was  obtained.  While  no  problem  on  the  Training  Test  Course,  fired  1 1  times 
by  each  crew,  detection  of  the  Day-Glo  paper  marked  targets  on  the  Record  Courses  was  a  problem. 
Over  one-half  of  the  targets  had  to  be  pointed  out  to  the  Tank  Commander  by  the  Tank  Controller 
(after  the  target  had  already  been  in  view  for  200  yards  of  tank  travel).  The  joint  sponsor  of  Project 
STALK  (CDC)  and  nearly  all  the  R&D  community  rejected  the  results  of  the  tests,  despite  their 
extensive  coverage  by  numerous  observers,  who  found  little  to  fault.  The  results  were  deemed  as 
“Too  Controversial.”  The  most  “controversial”  results  were: 

ONE:  The  Baseline  M4A3E8  achieved  the  fastest  time  from  target  acquisition  to  hit  on  the 
Record  Courses  irrespective  of  range  or  other  main  effects. 

TWO:  The  T48  with  the  range  finder/ballistic  computer  fire  control  combination  was  the 
slowest  of  all  the  tank/fire  control  systems  on  the  1 1  Record  Courses.  The  newest  and  “best” 
was  worst. 

THREE:  The  foregoing  results  were  nearly  reversed  on  the  Training  Test  Course  which  was 
fired  1 1  times  by  each  tank  crew. 

FOUR:  First  round  hitting  probability  (the  R&D  community’s  conventional  measure)  and  the 
time  to  hit  were  simply  not  correlated. 

Despite  the  Project  STALK  results,  I  was  hired  by  Dr.  Henriques  based  on  pressure  from  some 
source.  I  have  no  idea  of  the  source.  Almost  certainly  the  pressure  to  hire  passed  through 
General  Wyman.  I  quit  my  job  at  Operations  Research  Inc.  and  commuted  from  Washington  to 
TRADOC  at  Fort  Monroe,  VA. 
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DIRECTION  AND  STAFFING 


When  CDEC  opened  1  November  1956  at  Fort  Ord,  Brigadier  General  Fred  Gibb  and  I  were 
faced  with  General  Wyman’s  command  directive  to  complete  the  first  step  of  ROCID  testing  by 
1  July  1957,  when  California  reserve  troop  training  would  begin  at  HLMR.  General  Gibb  had  a  staff 
of  36  senior  combat  experienced  officers.  None  had  any  test  experience.  Before  Christmas,  the 
ROEC  staff  was  assembled.  It  consisted  of  new  hires  by  me  except  for  Dr.  Frank  Brooks,  die 
Director,  supplied  from  the  TOI TRADOC  staff.  In  all,  there  were  7  Ph.D.’s,  and  9  MS+  (including 
me).  Only  4  were  BS-level;  including  two  high  speed  computer  specialists  and  one  person  with 
Project  STALK  experience,  George  Scott,  whom  I  had  first  hired  at  BRL.  There  was  one  post-WWn 
retired  Army  Armored  Officer,  Colonel  Wesley  W.  Yale,  whom  I  had  interviewed  at  the  strong 
urging  of  General  Wyman.  Colonel  Yale  was  one  of  the  smartest  men  I  have  ever  known,  and  he 
had  a  superb  knowledge  of  strategy,  tactics,  terrain,  and  the  Pre-WWH  Army  organizational  and 
planning  exercises. 


PLANNING  THE  FIRST  FIELD  EXPERIMENT 

Not  surprisingly,  nearly  all  of  the  elements  of  the  January  1957  Outline  Test  Plan  were  prepared 
in  the  ROEC  offices  with  a  strong  CDEC  staff  representation.  Fort  Ord  was  still  modifying  barracks 
to  accommodate  CDEC  Headquarters.  The  resulting  Plan  included  a  strong  dose  of  Project  STALK 
and  Colonel  Yale  expertise.  It  included: 

ONE:  The  proposal  to  test  4  altOTiative  ROCID  company-level  candidate  organizations.  T^ese 
were  called  Integrated  Combat  Group  (ICG)  Company  Orgamzations.  Each  ICG  candidate 
organization  would  be  tested  by  all  of  the  four  “friendly”  companies  assigned.  One  aggressor 
motorized  Company  augmented  by  tanks,  antitank  weapons,  machine  guns,  and  mortars  would 
remain  the  same  throughout  the  trials.  Each  ICG  Company  Group  was  a  ROCID  company 
alternative  augmented  by  tanks,  antitank  weapons,  and  mortars.  Both  the  “aggressor  and  the 
“friendly”  companies  would  be  supported  by  artillery.  Both  forces  would  be  given  a  mission 
assignment  and  tactical  boundaries  with  relatively  free-play  in  the  course  of  5  interconnected 
trials  over  terrain  not  previously  operated  on  by  the  “friendly”  (ROCID  candidate)  company 
personnel. 

TWO:  The  test  would  use  a  4  x  4  Graeco-Latin  Square  experimental  design  to  treat  the  main 
effects  of  ROCID  company  organization;  “friendly”  company  personnel;  combat  terrain  and 
situation;  and  order  of  testing.  There  would  be  4  Record  Courses  and  a  separate  Training  Test 
Course  at  HLMR.  Training  and  retraining  of  the  “friendly”  companies  would  be  at  Camp 
Roberts. 

THREE:  The  data  record  would  include  the:  space-time,  response-time,  and  target 
characteristics  of  the  opposing  forces,  as  well  as  casualty  assessment  and  deletion  from  play.  2 
umpire  companies  were  requested  from  Fort  Ord  to  be  trained  to  report  this  information  to  a 
Master  Control  Station  and  to  follow  its  casualty  ascriptions  in  designating  specific  casualties. 
These  umpiie/controUers  were  to  be  assigned  to  each  squad,  tank,  and  antitank  weapon,  as  well 
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as  platoon  and  company  leadership  of  friendly  and  hostile  forces.  They  would  be  trained  on  the 
Record  Courses  and  the  Training  Test  Course. 

FOUR:  The  test  scenarios  (4  for  the  Record  Courses  and  1  for  the  Training  Test  Coursel  were 
arranged  so  that:  tactical  moves  would  be  mainly  along  the  wooded  ridges  at  HLMR;  mission 
objectives  were  primarily  on  high  ground;  and  the  open  valleys  would  be  crossed,  but  they  would 
be  used  as  advance  or  withdrawal  routes  as  little  as  possible.  All  5  mission  objectives  for  each 
trial  were  the  attack  or  defense  of  a  specified  piece  of  terrain.  Colonel  Wesley  Yale  was  a 
dominant  force  in  scenario  design.  It  also  must  not  be  forgotten  that  the  2nd  Division’s 
experience  in  the  withdrawal  from  the  Yalu  River  was  still  fresh  in  their  minds.  The  new  Army 
slogan — ”We  will  not  fight  for  real  estate” — ^meant  little  operationally  to  these  combat 
experienced  officers. 

FIVE:  The  measure  of  the  relative  effectiveness  of  the  ROCK)  candidates  would  be  some 
combination  of:  Enemy  casualties.  Friendly  casualties,  and  Time  of  Mission  Accomplishment. 
Many  of  the  ROEC  staff  wrestled  with  a  single  combination  and  expression,  but  came  to  the 
conclusion  that  it  might  be  used  as  a  three-dimensional  vector.  More  about  this  later. 

INSTRUMENTATION,  COMMUNICATIONS,  AND  CONTROL 

Until  nearly  the  end  of  the  rainy  season,  CDEC  and  ROEC  wrestled  with  instrumentation, 
communications,  and  control.  In  addition  CDEC  was  heavily  involved  in  administration  of  training 
of  the  7th  Infantry  Division  units  that  were  all  drawn  from  the  10th  Regimental  Combat  Team, 
commanded  by  Colonel  William  Montgomery.  The  communication  system  was  almost  wholly 
designed  using  standard  Signal  Corps  Equipment  by  a  colonel  whose  first  name  was  “John.”  He  was 
heavily  supported  by  the  Chief  Signal  Officer,  who,  when  he  came  to  be  briefed,  replied  to  the 
Colonel  with  “Anything  you  say,  John.”  Some  elements  of  the  Plan  were  concerned  with  unit 
position  measurement.  ROEC  recommended  that  the  areas  of  the  tactical  scenarios  be  overlaid  with 
a  100-yard  by  lOO-yard  grid  composed  of  wooden  2x4  stakes  projecting  5  feet  above  the  ground. 
Each  side  of  the  stake  was  color-coded  with  a  5-digit  identification  number  painted  on  each  side. 
The  Corps  of  Engineers  surveyed  in,  installed,  and  maintained  these  stakes.  Maintenance,  while  not 
great,  was  irritating  because  grazing  cows  tended  to  push  them  over.  Each  field  umpire/controller 
radioed  in  the  position  of  his  attended  unit  when  it  moved  or  engaged  the  opposing  force  by 
estimating  his  distance  and  compass  bearing  from  the  observed  post.  He  also  radioed  an  estimated 
range  and  compass  bearing  to  the  target  type  as  well  as  his  estimate  of  number,  exposure,  and  posture 
of  the  target  when  firing  occurred. 

FIELD  UMPIRE  ACTIONS 

The  space  time,  response  time,  and  target  characteristics  were  data  elements  supplied  by  the  field 
umpires/controllers.  The  umpire  actions  were  selection  and  designation  of  casualties  in  the  unit  he 
monitored  from  the  numbers  radioed  to  him  by  the  Master  Control  Center  located  in  a  tent  near  the 
Hearst  Mansion.  Also  he  fired  simulators  for  the  weapons  that  did  not  have  smoke  and  flash 
simulators.  This  included  mortars.  Bazookas,  and  106-mm  Recoilless  Rifles.  When  he  received 
news  of  indirect  fire  on  the  unit’s  position  he  also  set  off  smoke  and  flash  simulators  on  the  point 
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of  impact  provided  him  by  Master  Control  Center.  Tape  recordings  were  made  of  all  radio 
communication  among  the  units. 

THE  MASTER  CONTROL  CENTER 

According  to  the  CDEC  final  report  on  the  Umpire  Techniques  and  Procedures  Equipment,  dated 
September  1957,  the  Center  was  in  a  tent  over  an  8-foot  by  8-foot  horizontal  panel  with  a  map 
representation  of  the  area  being  played  with  the  numbered  2x4  posts  identified  on  the 
100  X  100  yard  grid.  An  acetate  overlay  contained  the  unit  location  markings.  2  Senior  Plotters  with 
the  help  of  4  Aggressor  and  4  Blue  plotters  seated  to  either  side  of  the  board  kept  the  positions  up 
to  date.  Behind,  and  physically  above,  the  4  plotters  on  either  side  were  3  platoon  umpires  and  one 
antitank  umpire.  Well  away  and  operating  a  second,  smaller  scale,  panel  of  the  playing  area  was  a 
single  indirect  fire  plotter  and  one  indirect  fire  umpire.  After  checking  the  plotting  board,  the 
umpires  translated  the  number  of  rounds  fired,  the  range  to  target  and  target  posture  into  a  number 
(if  any)  of  casualties  on  the  receiving  target.  This  information  was  transmitted  to  the  appropriate 
umpire  on  the  other  side  of  the  board.  This  umpire  radioed  the  appropriate  field  umpire/controller 
this  information.  The  casualty  information  was  derived  from  weapons  effects  data  that  had  been 
“Monte-Carloed”  into  a  distribution  of  specific  outcomes,  using  computer  time  available  up  and 
down  the  West  Coast.  The  platoon  umpires  had  tables  of  these  outcomes  for  the  different  direct  fire 
weapons  by  number  of  rounds  fired,  ranges,  and  target  postures.  Each  time  an  outcome  was  used 
a  line  was  drawn  through  it  with  the  time  and  unit  identifier  noted  beside  it.  When  the  next  similar 
action  was  reported  the  next  number  of  casualties  in  the  list  was  lined  out,  etc.  Indirect  fires  and 
antitank  fire  received  similar  treatment.  These  marked  pads  were  the  principal  raw  data  source. 
They  were  collected  each  day. 

TANK  INSTRUMENTATION  AND  CONTROL 

Tanks  were  a  special  problem  because  of  the  range  and  accuracy  of  their  fire.  It  lay  in  the  fact 
that  the  tank  umpire/controller  could  not  necessarily  know  if  the  tank  gun  were  aimed  correctly  at 
a  target  that  the  controller  in  the  loader’s  position  might,  or  might  not,  see.  Broadview  Research 
designed,  developed,  delivered,  and  mounted  on  the  tank  guns  a  boresighted,  collimated  auto  head 
lamp  that  was  turned  on  by  the  tank  umpire  when  the  tank  gunner  fired.  This  light,  by  adjusting  the 
auto  head  lamp  forward  and  backward  in  the  collimating  tube,  could  be  matched  to  the  .10  to  .20  mil 
dispersion  of  the  tank  rounds.  The  target  umpire  sighted  to  see  if  the  light  were  on  the  target.  If  so, 
he  radioed  to  the  Master  Control,  who  told  him  the  extent  of  casualties  to  assess.  Broadview 
Research  supplied  20  of  these  lights  plus  attachment  cables  for  $200  apiece. 

GENERAL  WYMAN’S  VISIT 

General  Wyman  visited  CDEC  in  May  when  2  of  the  4  phases  of  testing  had  been  completed. 
After  being  briefed,  he  made  the  comment,  “The  whole  thing  looks  too  ROCIDy  to  me.”  Brigadier 
General  Gibb  silenced  the  numerous  protests  concerning  TRADOC  directives  and  said,  “We  will 
change  the  name  to  the  *T  Imnire  Techniques  and  Procedures  Experiment  (UT&P),’  and  that  CDEC 
would  report  only  this  and  not  discuss  the  efficacy  of  the  ‘hostile’  and  ‘fiiendly’  unit’s  organization 
and  tactics.” 
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RESULTS  OF  THE  UT&P  EXPERIMENT 


The  principal  effectiveness  measure  of  the  UT&P  was:  The  time  between  one  field 
umoire/controller  reporting  an  action  and  the  time  the  opposite  umpire/controller  received  the 
information  on  casualty  assessment.  The  CDEC  report  showed  that  approximately  62%  of  all  these 
actions  in  the  last  2  of  the  4  phases  were  completed  within  one  minute.  It  was  also  concluded  that 
the  number  of  units  being  so  umpired  would  be  limited  only  by  the  number  of  trained 
umpire/controllers.  The  radio,  radio  relay  system  of  tube  radios  designed  and  provided  by  the  Signal 
Corps  was  deemed  adequate.  The  umpire  system  tested,  thus,  had  only  modest  room  for 
improvement.  ROEC,  probably  in  violation  of  General  Gibb’s  orders,  explored  the  range 
distribution  of  tank  fire  between  tanks  and  found  it  to  be  essentially  the  same  as  that  recorded  in  NW 
Europe  in  WWn.  Its  mean  was  approximately  670  meters  and  its  median  about  500  meters.  This, 
of  course,  was  not  reported  in  the  CDEC  report.  In  addition,  while  no  single  expression  combining 
enemy  casualties,  friendly  casualties,  and  time  of  mission  accomplishment  was  found,  there  was  no 
need  for  such  a  number  in  evaluating  the  candidate  ROCID  and  aggressor  companies.  The 
phenomenon  observed  from  the  Record  trials  was  that  for  attacking  companies  (either  Red  or  Blue) 
in  every  comparable  trial,  the  attack  company  that  accomplished  its  mission  fastest  inflicted  the 
heaviest  casualties  and  suffered  the  fewest  losses.  This  also  could  not  be  reported. 

PAUL  ERDOS 

The  major  step  in  the  disintegration  of  the  CDEC-ROEC  relationship  occurred  in  August  or 
September  1957.  I  was  an  invited  speaker  to  a  MORS  at  Stanford,  and  I  was  accompanied  by  the 
very  astute  Staff  Officer  to  General  Gibb,  Colonel  Harold  Marx.  The  MORS  keynote  address  was 
given  by  the  great  Hungarian  mathematician.  Dr.  Paul  Erdos,  who  died  at  83,  this  past  September. 
The  subject  was  a  very  erudite  speech  on  recent  advances  in  Game  Theory,  one  of  the  most  advanced 
fields  of  that  time.  He  concluded  with:  “I  see  a  glimmer,  as  of  the  rising  sun  on  a  distant  horizon, 
the  use  of  two-sided  operational  games  to  predictably  measure  the  outcome  of  military  and  corporate 
operations  and  strategies.”  That  afternoon  I  gave  my  paper  on  the  CDEC  approach.  When  the  chair 
asked  for  questions  or  comments.  Dr.  Erdos  strode  to  the  center  of  the  stage,  raised  his  hands  high 
and  almost  shouted  “I  was  wrong,  the  sun  has  already  risen  high!”  Colonel  Marr  looked  at  me 
peculiarly  when  I  got  back  to  my  seat  The  next  day  General  Gibb  called  me  in  his  office  and  made 
the  following  points,  which  I  believed  then  and  believe  now  were  absolutely  sincere: 

ONE:  He  felt  deceived  because  I  had  not  told  him  that  the  two-sided  operational  game  had  never 
before  been  tried.  (Note  that  I  had  used  this  term,  from  the  first,  to  describe  the  test  model). 

TWO:  He  believed  that  Good  Science  was  the  application  of  established  and  thoroughly  proven 
methods. 

THREE:  He  had  expected  that  use  of  the  Scientific  Method  and  Objective  Science  would  reduce 
the  effort  required  to  develop  Organizations  and  Procedures.  Rather,  he  had  to  drive  his  staff  and 
troops  harder  than  he  did  when  he  was  with  the  1st  Infantry  Division  in  WWn. 
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FOUR:  He  felt  that  he  could  no  longer  place  complete  trust  in  the  ROEC  scientific  support,  and 

would  seek  outside  expert  help  in  future  CDEC  work. 

REPLACEMENT  ON  THE  100  x  100-YARD  STAKES 

The  stakes  were  replaced,  over  my  warning,  with  500-yard  by  500-yard  20  foot  steel  posts 
carrymg  a  large  box  on  top  with  the  identifying  markers.  In  the  Mobility  experiment  it  was 
frequently  found  that  the  boxes  were  not  found  or  could  not  be  seen  by  the  field  umpire/controllers. 
The  space  time  position  measurement  now  had  big  errors  due  to  search  problems  (Shades  of  Project 
STALK!!)  and  trees  whose  branches  spread  at  10  to  15  feet  above  the  ground.  Moreover,  when  the 
new  stakes  were  observed  the  average  umpir^'controUer  estimation  of  distance  from  his  hue  position 
was  increased  from  about  10  yards  to  about  50  yards.  Once  again  it  was  found  that  mean  range 
estimation  error  is  about  20%  of  true  range!  Remember  the  Corps  of  Engineers  did  not  like  the 
wooden  stakes.  In  addition,  based  on  a  proposal  by  IBM-San  Jose,  an  IBM/620  (1620)  computer 
was  installed  at  the  Master  Control  Center  and  a  computer  controlled  vertical  back  lighted  panel 
replaced  the  Master  Control  board.  This  provided  a  good  view  for  visiting  dignitaries  but  was  of 
diminished  help  to  the  controllers.  Most  importantly,  the  1620  was  used  to  solve  the  Monte  Carlo 
selection  of  casualties  from  weapons  effects  functions.  The  resulting  queuing  increased  the  measure 
of  time  from  field  umpire  input  to  message  receipt  by  the  umpire  of  the  opposing  force  firom 
1  minute  to  6  minutes  for  62%  of  the  casualty  assessments.  Position  recordings  from  the 
500  X  500  yard  posts  also  queued  at  the  computer.  Something  had  to  be  done. 

TRILATERATION 

In  the  face  of  my  remonstrations,  CDEC  took  proposals  from  such  contractors  as  Cubic 
Corporation  for  optical  or  electronic  trilateration  schemes.  I  proposed  using  the  British  Bendix  low- 
frequency  hyperbolic  grid  scheme  being  used  in  Portsmouth  Harbor  with  the  expectation  it  could  be 
used  in  hilly,  tree-covered  terrain.  It  was  rejected  because  it  was  British.  (More  shades  of  Project 
STALK  where  the  Centurian  tank  was  not  allowed  by  DA  because  it  was  British).  Clearly,  of 
course,  trilateration  requires  movement  of  the  forces  being  tested  into  open  terrain. 

DR.IANTERVETT 

In  the  Fall  of  1977,  Dr.  Ian  Tervett  replaced  Dr.  Frank  Brooks  as  Director  of  ROEC.  Dr.  Fred 
Henriques  of  TOI  did  this  because  of  Frank’s  health  problems.  Dr.  Tervett  had  recently  left  the 
U.S.  Army  Chemical  Corps  as  a  Civil  Servant,  where  his  major  research  work  had  been  on  Chenaical 
TVfnIiflnts  He  had  some  experience  in  testing  them.  He  strongly  felt  that  General  Gibb  needed  a 
Civil  Service  Chief  Scientist— -a  position  he  took  shortly  after  the  demise  of  the  ROEC  contract  in 
June  1958. 


SCOUTING  AND  HELICOPTER  EXPERIMENTS 

In  the  last  year  of  its  contract,  ROEC  designed  and  supported  several  tests.  Among  these,  was 
the  Scouting  Experiment,  where  it  was  found  that  the  number  of  hostile  detections  by  U.S.  Army 
scouting  units  had  a  very  high  correlation  with  the  number  of  scouting  observers  regardless  of  their 
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mode  of  transportation  including:  Jeep,  armored  vehicle,  helicopter  or  foot;  or  the  scouting  tactics 
used.  Line  of  sight  was  a  necessary,  but  insufficient  condition  for  target  detection.  (More  shades 
of  Project  STALK!)  A  helicopter-borne  scout  was  recorded  as  acquiring  twice  as  many  detections, 
if  the  detections  by  the  pilot  were  not  taken  into  account.  I  have  no  copy  of  this  CDEC  report  if  ever 
one  was  made.  I  recall  serious  criticism  because  the  PPS-4  was  not  used.  This  was  because  it  could 
not  function  after  Jeep  transportation  to  the  test  site.  It  was  a  very  early  model.  A  Helicopter 
Experiment  was  run  using  gun  cameras  and  hehcopter  vulnerability  data  from  BRL  for  various  anti¬ 
aircraft  and  other  direct  fire  weapons  such  as  rifles.  The  controversial  finding  was  the  low  level 
helicopter  flights  in  variable  terrain  would  expose  them  to  being  hit  frequently.  In  1958,  the 
conventional  wisdom  was  that  could  not  happen  because  of  our  experience  in  Korea,  where  the  CAA 
did  not  fire  on  our  medical  helicopters  removing  Allied  and  Chinese  casualties  from  the  battlefield. 
This  was  not  reported  by  CDEC.  Rather  the  Operations  Research  Office  was  brought  in  to  design 
and  execute  essentially  the  same  test  using  Photo-Theodolites.  ORO’s  Draft  report  on  its  test  came 
out  about  4-6  years  later,  as  I  recall.  In  any  case,  its  evidence  was  the  same. 

RELOCATION 

Although  I  was  proffered  a  job  by  SRI,  as  was  the  rest  of  the  technical  staff,  and  one  in  Boston 
by  TOI,  I  moved  to  Virginia  in  1958  to  work  on  a  Combat  Surveillance  Contract  of  Connell 
Aeronautical  Laboratories  with  the  Signal  Corps  Combat  Surveillance  Agency.  I  went  to  work  on 
the  concept  definition  and  testing  of  the  SD-2  Surveillance  Drone. 

THANK  YOU!! 


Example  of  a  4  x  4  Graeco-Latin  Square 

CAR 


1 

2 

3 

4 

I 

Aa 

Bp 

Cy 

D6 

DRIVER  n 

B6 

Ay 

Dp 

_ I 

Ca 

m 

cp 

Da 

A6 

By 

IV 

Dy 

C6 

Ba 

Ap 

Additives:  A,  B,  C,  D 
Days:  a,p,Y,5 


126 


FROM  FIELD  EXPERIMENTATION  TO  SIMULATION: 

THE  FORTY  YEAR  QUEST  TO  UNDERSTAND  COMPLEX  SYSTEMS 

By 

Henry  C.  Alberts,  Professor  of  Acquisition  Management 
Defense  Systems  Management  College,  Fort  Belvoir,  Virginia  22060 


ABSTRACT 

From  its  founding  in  1956,  the  experimental  facilities  established  by  the  Army,  first  at  Fort  Ord 
and  then  at  Fort  Hunter-Liggett  California  as  the  Combat  Development  Experimentation  Center 
(CDEC)  have  provided  a  unique  laboratory  to  explore  the  behavior  of  complex  systems.  At  first, 
with  the  most  rudimentary  information  providing  equipment,  and  later  with  more  modem  devices, 
the  events  and  relationships  among  elements  of  fighting  forces  were  played  out  on  the  field  in 
disciplined  activities  which  helped  provide  crucial  insight;  (1)  for  our  armed  forces  in  combat;  (2) 
for  those  who  devise  operational  tactics;  and,  (3)  for  those  who  plan  and  design  new  combat 

equipment. 

This  paper  examines  the  years  between  1962  and  1981  fi'om  the  point  of  view  of  the 
instmmentation  capabilities  used  to  provide  data  upon  which  analyses  were  based  and  traces  the 
increasing  sophistication  of  data  collection  and  management  devices  throughout  the  period. 


A  PERSONAL  VIGNETTE 

In  1956,  the  Army’s  Chief  of  Ordnance  contracted  with  the  Pennsylvania  State  University  (PSUl 
to  form  a  team  to  study  then  available  U.S.  capability  to  defend  the  United  States  against  threats 
posed  by  Intercontinental  Ballistic  Missiles  (ICBMs).  I  was  a  member  of  that  study  team.  My 
qualifications  included:  (1)  research  and  experimental  work  in  supersonic  flow  phenomena  which  I 
had  done  for  the  Army  at  the  Ballistic  Research  Laboratory  during  the  years  1949  through  1953; 
(2)  service  as  coordinator  for  Geophysical  Research  and  Development  involved  with  the  U.S.  Air 
Force’s  Guided  Missile  activities  from  1953  to  1956;  and  (3)  experience  as  Head  of  Operations 
Research  for  AVCO  Corporation’s  Advanced  Research  and  Development  organization.  Air  Force 
contractor  for  design,  fabrication,  and  test  of  the  ATLAS  ICBM  re-entry  body. 

It  was  in  connection  with  this  latter  work  that  I  had  performed  a  study  of  the  vulnerability  of 
ICBM  vehicles  to  existing  anti-missile  systems.  The  PSU  principal  investigator.  Dr.  Harold 
Hipsch,  Chairman  of  the  Aeronautical  Engineering  Department,  was  extremely  interested  in  my 
experiences  in  design  and  construction  of  re-entry  vehicles.  He  wanted  to  develop  a  “time-line  ’  of 
events  which  could  be  used  to  estimate  which  of  the  multiple  potential  defense  configurations 
would  likely  be  most  effective.  I  had  done  this  for  the  ATLAS  missile,  making  estimates  of 
elapsed  times  between  significant  events.  I  had  also  examined  operational  sequences  of  missile 
preparation,  launch,  flight,  re-entry,  and  impact.  I  had  attempted  to  find  “hard  data”  relevant  to 
each  phase  of  missile  operations.  But  although  there  were  attempts  to  collect  measurements 
during  the  normal  course  of  development  activity,  there  were  no  organized,  consistent  programs 
which  had  as  their  objective  the  disciplined  design  of  reproducible  sequences  of  events. 
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Consequently,  the  estimates  of  duration  of  time-line  events  v^re  derived  from  theoretical 
considerations,  which  we  later  found  to  have  little  resemblance  to  the  actual  operating  context. 

One  of  the  PSU  team  members  was  a  Lieutenant  Colonel  of  Infalitry.  He  spent  his  study  team 
time  reminding  us  that  we  were  working  to  improve  the  capability  of  “real  soldiers”,  and  that  the 
theory  we  were  developing  would  need  to  be  applied  to  the  real  world  of  actual  Held  maneuver. 
But  we  all  recognized  that  the  capability  to  do  that  was  limited  by  the  existing,  available  field 
facilities.  One  day,  however,  he  informed  us  of  the  establishment  of  a  new  proving  ground 
complex  in  California  which  would  explore  the  relationships  of  individual  troops  engaged  in 
simulated  operational  maneuvers. 

I  heard  very  little  about  the  result  of  initial  activities  at  CDEC  until  1959,  when  I  began  to  work 
with  Dr.  William  C.  Pettijohn  who  was  then  at  Johns  Hopkins  University’s  Operations  Research 
Office.  One  of  AVCO’s  products  was  a  shell  designed  for  the  Army’s  new  M-79  grenade 
launcher,  and  there  had  been  difficulty  in  maintaining  both  the  required  CEP  and  the  round  to 
round  dispersion  when  firing  production  ammunition.  As  Head  of  Operations  Research,  I  was 
asked  to  look  at  the  problems  and  see  how  to  solve  them.  Dr.  Pettijohn  arranged  with  me  to 
perform  a  field  measurement  program  to  examine  the  characteristics  of  the  M-79  weapon  with 
specific  emphasis  on  how  well  soldiers  using  it  could  aim  their  fire.-When  he  completed  the  work, 
we  found  that  the  sighting  and  aiming  errors  were  so  large  that  the  requirements  for  tight  CEP 
and  low  shell  round-to-round  dispersion  would  severely  limit  weapon  effectiveness:  The  aiming 
errors  were  between  20%  and  25%  of  range  to  target!  In  the  process  of  performing  the 
experimental  work.  Dr.  Pettijohn  became  very  interested  in  the  concept  of  CDEC  and  how 
CDEC’s  type  of  activity  might  materially  improve  Army  combat  capability.  He  moved  to  CDEC 
shortly  thereafter,  working  for  Stanford  Research  Institute  in  the  Fort  Ord  Research  Office. 

In  1960, 1  joined  National  Company,  Incorporated,  which  had  been  engaged  in  developing  state  of 
the  art  communication  equipment,  and  super-accurate  timing  devices.  One  of  the  products  was 
the  first  “atomic  clock”.  That  particular  device  was  based  on  a  Rubidium  gas  standard  and  used 
the  quantum  energy  available  from  excitation  of  the  Rubidium  atoms  to  maintain  a  digital  counter 
accurate  to  tens  of  milliseconds  per  year.  We  used  the  clock  to  perform  a  test  of  the  capability  to 
synchronize  time  across  great  distances;  and  incidentally,  by  using  a  B-36  platform  in  continuous 
flight,  we  were  able  to  check  on  Einstein’s  Twin  Paradox.  -  the  prediction  that  a  rapidly  moving 
platform  experiences  times  passage  slower  than  one  which  is  stationary.  In  1961,  I  visited 
Stanford  Research  Institute  to  brief  the  staff  on  the  results  of  what  we  had  called  “Operation  Time 
Tack.”  Dr.  Pettijohn  and  Scroggie  Wiley  provided  quid  for  the  pro  quo  by  talking  at  great  length 
about  the  difficulties  they  were  experiencing  in  collecting  time  related  data  at  CDEC.  Believing 
that  I  could  contribute  to  that  problem’s  solution,  I  joined  the  SRI  Research  Office  in  October, 
1962  to  work  on  improving  the  capability  to  instrument  the  field  activities  and  permit  collecting 
integrated,  time-sequenced  position  location  and  event  information. 
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STEPS  IN  THE  PROCESS  OF  MEASURING  REALITY 

The  process  of  designing  an  instrumentation  system  for  CDEC  experimentation  began  with 
devising  a  plan  which  would: 

1 .  Measure  the:  (a)  experimental  time  line  on  which  all  events  could  be  placed;  (b)  position  all  of 
the  men  and  equipment  on  the  field  (which  we  grouped  under  the  heading  of  “players’ );  (c) 
events  which  took  place  at  every  location  in  the  field;  (d)  results  of  all  engagements  among  the 
players 

2.  Insofar  as  possible,  provide  a  degree  of  field  realism  to  try  to  make  players  respond  as  if  they 
were  in  real  combat  action. 

3.  Provide  the  capability  to  capture,  classify,  evaluate,  and  display  the  collected  information  in 
real  time  to  the  large  numbers  of  individuals  involved  with  directing,  monitoring,  and 
analyzing  the  field  activities  in  progress;  and  reconstructing  the  action  repeatedly  so  it  could 
be  studied  in  detail. 

I  had  thought  that  the  fundamental  issue  of  providing  for  a  synchronous  experimental-test  time 
line  upon  which  to  place  each  event  taking  place  in  the  field  would  yield  easily.  After  all,  the 
Naval  Observatory  routinely  broadcast  the  U.S.  standard  timing  signals  over  WVW.  But  that 
hope  soon  faded.  There  were  propagation  anomalies  which  made  Fort  Hunter-Liggett  unsuitable 
for  standard  kinds  of  then  available  broadcast  systems  used  to  send  time  across  space.  In  the  end, 
we  were  forced  to  provide  and  broadcast  a  timing  signal  from  the  experimental  area  to 
experimental  participants  -  and  even  that  specially  designed  system  could  not  provide  timing 
signals  throughout  the  experimental  terrain.  Nor  could  other  kinds  of  radio  signals  be  reliably  sent 
from  those  areas  to  a  control  center  location. 

For  similar  reasons,  it  was  infeasable  to  use  standard  navigational  systems  such  as  LORAN  or 
TACAN  to  provide  position  measurements  at  all  player  locations.  We  were  required  to  construct 
our  own  triangulation  mechanism  and  to  devise  specific  kinds  of  player  modules  for  field  use. 

Additionally,  using  the  newly  developed  position-location  equipment  to  transmit  events  which 
took  place  at  the  players’  locations  turned  out  to  require  more  bandwidth  than  was  available  on  an 
already  restricted  transmission  frequency  set. 

The  problem  of  marking  engagement  pairs,  and  assessing  the  results  was  also  challenging.  Here 
the  difficulty  was  in  determining  whether  an  engagement  could  even  have  occurred;  Did  line  of 
sight  exist  between  the  two  players?  If  an  indirect  fire  engagement  was  in  progress,  did  the 
settings  of  the  weapons  and  the  positions  of  potential  targets  enable  fire  coverage  in  the  particular 
parts  of  the  terrain  involved?  Lacking  the  motivation  of  live  fire,  were  actions  taken  by  target 
players  representative  of  their  responses  in  actual  combat? 
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Displays,  too,  were  unable  to  assimilate  and  process  the  large  amounts  of  information  which 
would  be  taken  in  the  field  when  all  of  the  instrumentation  were  working  and  reporting  at  the  data 
repetition  rates  we  thought  we  needed  to  ensure  performance  of  analyses  of  the  desired  accuracy. 


At  the  time,  I  characterized  the  problems  of  devising  a  useful,  real-time  field  data  collection 
system  as  “trying  to  do  21st  Century  science  under  Medieval  field  conditions!” 

By  mid- 1963,  progress  had  been  made  in  all  technical  areas  required  to  provide  the  basis  for 
instrumentation  development.  And  then,  in  the  midst  of  it  all,  there  arose  a  debate  between  The 
Ballistic  Research  Laboratories  at  Aberdeen,  and  those  who  were  developing  vertical 
envelopment  tactics  in  Vietnam  related  to  the  survivability  of  large  numbers  of  helicopters 
operating  in  that  environment.  We  were  tasked  to  develop  and  execute  an  experimental  plan  to 
obtain  data  on  the  effectiveness  of  ground  live-fire  against  helicopters.  One  part  of  the  program 
required  us  to  develop  live  fire  targets  that  looked  and  maneuvered  like  UH-IB  aircraft,  and  that 
could  report  the  event  of  their  having  come  under  fire.  In  addition,  if  the  targets  were  hit,  they 
would  also  report  the  location  of  projectile  entry  and  exit  so  that  the  probable  aim  and  firing 
information  could  be  captured. 

The  component  instrumentation  for  the  test  program  was  developed  and  in  place  in  less  than  9 
months.  Drone  helicopter  targets  were  constructed  using  reconditioned  OH- 13  units  (purchased 
from  oil  rig  operators)  fitted  with  a  hit  sensitive  skin  which  made  them  resemble  a  scaled  down 
UH-IB,  and  carrying  an  array  of  microphones  which  permitted  acoustic  measurements  of  the 
shock  waves  emanating  from  projectiles  which  approached  the  envelope  of  sensitivity  around  the 
target.  The  firing  itself  took  place  at  Fort  Bliss  Texas  in  late  1964  and  1965.  Use  of  Fort  Bliss 
allowed  us  to  use  the  position  location  and  timing  instrumentation  in  place  on  the  Dona  Ana 
Range.  We  learned  a  lot  from  this  program  and  became  involved  in  a  debate  with  BRL  related  to 
the  process  of  aiming  and  firing  multiple  shot  and  automatic  weapons.  We  predicted  higher 
survival  rates  for  vertical  envelopment  tactics  in  Vietnam  than  had  been  predicted  (and  accepted 
as  likely)  by  them  and  others.  When  our  estimates  were  confirmed  in  action,  we  felt  we  had  made 
a  real  contribution  to  our  fighting  forces. 

From  the  perspective  of  instrumentation,  digital  computers  were  in  their  primary  stages  of 
development  at  that  time.  Only  in  1960  did  digital  process  become  the  preferred  methodology  to 
perform  complex  computations.  Prior  to  that  time,  analog  computers  were  used  to  represent 
systems  and  to  determine  results  of  varying  any  of  the  many  parameters  involved  in  their 
performance.  Only  eight  years  had  elapsed  since  Dr.  William  Shockley  had  demonstrated  the 
capability  of  doped  silicon  wafers  to  act  as  amplifying  devices.  The  entire  concept  of  digital 
communications  as  a  replacement  for  the  standard  analog  transmission  theory  and 
communications  construction  methodology  was  still  some  time  in  the  future.  In  many  ways, 
attempts  to  achieve  the  objective  of  providing  measurements  which  would  allow  us  to  understand 
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very  complex  systems  resulted  in  our  having  to  advance  the  state  of  the  art  in  sensing 
communications,  display,  and  mathematical  analysis.  It  didn’t  seem  as  though  we  were  on  t  e 
cutting  edge  of  technical  capability  -  although  when  we  would  talk  with  others  who  were 
attempting  similar  things,  we  found  out  we  were.  To  us,^  it  seemed  as  if  the  ancient  Chinese  curse 
had  come  to  pass:  We  were  “living  in  interesting  times! !” 

Although  there  was  significant  progress  toward  developing  a  capability  to  measure  field 
occurrences  and  perform  analysis  of  them,  we  found  that  we  were  doing  many  of  the  same 
experiments  over  and  over  again.  I  asked  the  Research  Board  to  consider  the  possibility  of 
constructing  a  series  of  experimental  building  blocks:  exercises  which  would  be  performed  under 
broad  sets  of  conditions  and  then  used  as  “ground  truth”  for  those  elements  for  ever  after.  I  had 
the  idea  that  Omar,  the  tent  maker  was  correct  when  he  said  in  the  Rubiyat:  “The  moving  finger 
writes  and  having  writ  moves  on:  nor  all  thy  piety  and  wit  shall  lure  it  back  to  cancel  half  a  line, 
nor  all  thy  tears  wash  out  a  word  of  it!”  How  naive  I  was.  I  had  no  understanding  then  that  when 
dealing  with  large  complex  systems,  there  is  only  limited,  if  any,  reproducibility  over  time  and 

space. 

As  we  continued  to  devise  instrumentation  to  capture  the  operational  world  as  represented  in  the 
field  at  CDEC,  we  began  to  sense  other  problems  made  visible  by  the  considerable  improvement 
we  had  made  in  measuring  sequences  of  events  on  a  consistent  time  line.  We  saw  that  there  was 
considerable  variance  in  performance  of  set-piece  tasks  depending  upon  precedent  and  antece  ent 
tasks.  We  tended  to  minimize  these  variances  and  declare  “experimental  error”  as  the  cause.  I 
failed  then  to  grasp  the  full  meaning  of  what  I  saw.  Later,  I  would  be  able  to  place  it  all  into 
perspective  and  draw  insight  from  the  experience:  I  learned  that  outcomes  of  complex  events  are 
highly  dependent  on  the  task  sequences  and  the  life  experiences  of  individuals  involved  in  their 
performance.  This  would  emerge  more  clearly  in  work  on  Small  Independent  Action  Forces 

discussed  below. 

In  1966, 1  left  the  Research  Office  to  work  with  a  former  CDEC  Commandant,  BG  Charles  J. 
Girard  who  was  assigned  to  Headquarters,  Seventh  Army  in  Germany.  SRI  provided  a  team  of 
analysts  to  consider  the  problems  involved  in  using  information  developed  in  yearly  fie 
exercises.  "WTien  instrumenting  CDEC,  I  could  foresee  a  day  when  there  would  be  a  plethora  of 
data;  a  time  when  an  individual  human  being  could  not  comprehend  everything  the 
instmmentation  would  tell  them.  Psychologists  call  this  kind  of  problem,  “Cognitive  Overload” 
and  it  is  a  common  occurrence  in  today’s  world.  In  Germany  I  learned  what  too  much  data,  both 
organized  and  unorganized,  could  do  to  human  understanding.  We  had  just  about  completed  the 
work  in  Germany  when  Braddock,  Dunn,  and  McDonald  (BDM)  assumed  responsibility  for 
CDEC  support. 

I  worked  on  a  number  of  problems  at  SRI  Menlo  Park  before  moving  to  Sweden  in  October  of 
1966.  Once  there,  I  worked  on  private  sector  problems.  In  the  process  of  serving  private  sector 
needs,  I  learned  a  great  deal  about  the  difficulty  in  obtaining,  interpreting,  and  presenting  data 
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so  that  it  provides  “Information”.  As  a  result,  when  I  returned  to  the  United  States  in  1968, 1  was 
able  to  understand  that  there  were  extensive  commonalties  between  the  commercial  and  military 
worlds:  in  both  it  was  difficult  to  deal  with  large  information  flows  generated  from  real  time 
observations  of  complex  systems. 

One  practical  illustration  of  how  “field  experimentation”  and  “field  exercise”  can  turn  into 
“simulation”  has  its  roots  in  the  CDEC  experience.  It  concerns  a  large  data  base  building  program 
based  on  data  collected  about  Small  Independent  Action  Forces  (SIAF)  operating  independently 
of  Battalion  and  Division  control  in  Vietnam.  Small  units  had  been  seen  to  be  more  successful  in 
detecting  and  reporting  enemy  activities,  engaging  when  necessary,  all  while  keeping  casualty 
counts  below  units  operating  interior  to  large  force  elements.  The  field  experience  of  Army  Long 
Range  Reconnaissance  Patrols  (LRRPs),  Navy  SEALS,  and  Marine  Reconnaissance  Units  had 
shown  that  their  operating  tactics  provided  an  effective  means  of  combatting  both  the  North 
Vietnamese  Army,  and  Viet-Cong  forces.  The  people  at  ARPA  wanted  to  understand  exactly  how 
small  unit  force  actions  differed  from  larger  scale  fighting  and  to  use  that  understanding  to 
develop  better  force  deployment  and  action  tactics.  Dr.  Pettijohn  had  already  joined  ARPA’s 
support  contractor  team,  and  I  joined  him  there  in  1970.  Together,  we  spent  three  years  building  a 
data  base  which  could  describe  quite  accurately  the  way  small  units  operated  in  Vietnam.  Building 
the  data  base  required  5  step  process: 

1.  interviews  were  conducted  with  members  of  the  U.S.  SIAF  units  immediately  after  they  had 
returned  from  patrols.  Each  patrol  member  was  asked  to  reconstruct  the  entire  patrol 
experience  from  his  own  point  of  view.  The  applicable  terrain  maps  were  laid  out  and 
questions  were  asked  about:  (a)  the  Operational  Order;  (b)  the  actual  insertion;  (c)  how  the 
patrol  proceeded  on  the  ground;  (d)  how  fast  it  went  across  the  terrain;  (e)  how  many  enemy 
detections  were  made  which  did  not  result  in  engagement,  and  the  circumstances  under  which 
they  occurred;  (f)  the  fire  fights  (if  any)  which  resulted  from  enemy  detection  of  friendly 
forces  either  prior  to,  or  concurrent  with  detection  by  friendly  forces  and  the  expenditure  rates 
of  ammunition  during  those  fire-fights;  (g)  the  external  support  required  (h)  the  withdrawal; 
and,  (i)  the  perception  of  patrol  results.  Additional  interviews  were  conducted  with  small  units 
made  up  of  foreign  troops  who  were  operating  independently  of  larger  units  to  gain 
comparable  understanding  of  how  they  functioned  during  their  patrols. 

2.  Pictures  of  representative  terrain  were  shown  to  patrol  members  and  the  data  they  had 
provided  in  their  interviews  was  linked  to  the  terrain  type  over  which  the  patrol  proceeded 
during  each  time  increment.  Patrol  members  were  asked  to  explain  the  reasons  why  they 
would  select  a  movement  rate,  what  dictated  their  positions  during  both  movement  and  at 
rest,  their  estimate  of  the  density  of  enemy  troops  on  the  terrain,  and  the  way  in  which  the 
enemy  dispersed  and  moved  over  the  ground  during  the  patrol  period. 

3.  A  statistical  analysis  was  performed  to  derive  relationships  among:  (a)  terrain  types;  (b)  terrain 
movement  rates’  (c)  perceived  enemy  distributions  over  terrain;  (d)  detection  occurrences  and 
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probability  for  both  friendly  and  enemy  forces;  (e)  engagement  probability  given  detection, 
and  (f)  outcomes  of  engagements  (including  rates  of  ammunition  expenditure,  numbers  and 

types  of  casualties). 

4.  Seasoned  Vietnam  veteran  troops  were  used  in  a  field  experiment  in  Hawaii  Natiorial  Fore^  - 
an  area  on  the  island  of  Hawaii  which  resembled  an  area  of  Vietnam  about  which  we  had 
gathered  considerable  information.  ’’Enemy”  troops  were  dispersed  on  the  terrain  in  tactic^ 
positions  corresponding  to  those  used  by  both  North  Vietnamese  and  Viet-Cong  forces;  and 
these  troops  moved  on  the  terrain  tactically  as  those  enemy  forces  would  have  done.  24  small 
independent  action  force  patrols  were  asked  to  perform  search  and  reconnaissance  rmssions 
over  the  terrain,  and  their  movements  and  all  other  activities  monitored  with  considerable 
accuracy.  Each  patrol  was  of  five  to  eight  days  duration.  At  first  light  each  morning,  the  data 
collected  from  the  previous  day’s  activity  was  flown  to  Honolulu  and  processed  on  a  CDC 
6400  computer.  Results  were  returned  to  Hawaii  as  soon  as  they  were  obtained,  usually  prior 
to  3:00  P.M.  of  the  same  day.  Activities  for  the  next  day  were  determined  based  on  the 
totality  of  data  processed  up  to  that  time. 

5.  A  computer  assisted  game  was  developed.  Complete  with  terrain  film  clips,  operational 
orders,  simulated  enemy  troop  distributions  and  movements.  The  intent  was  to  provide  a 
simulation  of  the  experience  captured  within  the  data  base.  The  simulation  was  applied  to 
twelve  experienced  Vietnam  combat  patrols  at  the  Special  Forces  Training  School..  As  the 
simulations  ran,  data  was  taken  about  troop  responses. 

6.  The  data  from  Vietnam,  Hawaii,  and  the  Special  Forces  School  were  compared  to  see  if  each 
data  set  belonged  within  the  same  data  universe.  When  that  had  been  shown,  we  had 
considerable  confidence  that  we  would  be  able  to  test  new  tactics  in  simulated  Vietnam 
conditions  without  deplo5dng  large  numbers  of  troops  in  field  experiments  specifically  for  that 
purpose. 

When  the  SIAF  work  ended,  I  turned  to  other  kinds  of  data  collection  and  analysis  work.  With 
the  exception  of  planning  two  more  field  experiments  for  the  Marine  Corps  Development  and 
Education  Center,  (MCDEC)  during  the  period  from  1974  through  1980,  I  was  absent  from  the 
military  field  experimentation  milieu. 

In  October  1980,  Scroggie  Wiley  called  me  and  asked  if  I  would  be  interested  in  returning  to 
CDEC  in  my  old  role  as  Director  of  Instrumentation.  I  was  delighted  to  do  so.  I  had  so  enjoyed 
my  time  at  CDEC  that  I  was  happy  with  the  idea  of  reliving  my  youth.  When  I  returned  in 
November  1980,  I  saw  that  the  things  we  had  pressed  as  advances  to  the  state  of  the  art  in  field 
data  collection  in  the  early  and  mid- 1 960’ s  had  been  completed  and  were  functioning.  I  looked 
forward  to  defining  the  next  generation  of  data  collection,  processing,  and  analytical  devices  and 
to  putting  them  into  the  field  to  achieve  distributed  systems:  a  kind  of  internet  concept  for 
experimentation  where  simulations  resident  externally  to  CDEC  would  be  incorporated  within 
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CDEC’s  field  experiments  and  results  obtained  through  exercise  of  simulations  transmitted  to 
CDEC  for  use  within  ongoing  experimentation.  Even  at  that  time,  it  seemed  clear  that  the  cost  of 
the  sort  of  activities  in  which  CDEC  has  been  historically  engaged  was  rapidly  becoming 
prohibitive.  And  it  was  also  clear  that  some  ideas  for  improvements  to  existing  CDEC 
instrumentation  equipment  had  been  institutionalized  within  the  experimental  community  to  the 
extent  that  progress  in  making  great  change  to  historical  directions  would  be  difficult  to  achieve. 

About  the  middle  of  July  1981.  it  also  became  clear  that  my  family  was  firmly  anchored  to  the 
Eastern  half  of  the  united  States.  All  of  my  children  and  grandchildren  were  there.  With  the  desire 
to  maintain  close  family  ties  uppermost  in  my  mind,  I  returned  to  Virginia  in  November  1981  and, 
in  1983  joined  the  Defense  Systems  Management  College  as  Professor  of  Engineering 
Management  in  the  hope  of  helping  students  cope  with  the  very  complicated  business  of  design, 
development,  test,  production,  and  support  of  military  weapon  systems. 


AND  WHAT  ABOUT  TOMORROW? 


My  grand-children  ask  me  about  historical  things.  They  say  I  am  “Living  History”.  It  is  as  hard 
for  them  to  understand  the  world  in  which  I  grew  up  as  it  is  for  me  to  understand  what  life  was 
like  when  my  parents  were  young.  As  we  have  all  come  to  know,  perceptions  are  fact;  truth  is 
transient;  and  the  future  is  a  guess!  Notwithstanding  all  of  that,  I  would  like  to  make  some 
guesses  to  this  particular  audience  about  the  effect  of  technology’s  relentless  advance  and  how  we 
gain  understanding  of  complex  activities  (of  which  warfare  is  certainly  one  of  the  most  complex 
and  intense  of  human  activities). 

Marshall  McLuhan  [1]  provided  us  with  the  insight  into  how  the  Russian  empire  would  fall.  He 
projected  a  “global  village”  in  which  information  could  not  be  controlled  and  where  the  power  to 
see  what  was  happening,  as  it  was  happening,  would  inexorably  shape  world  events.  There  are 
few  who  would  deny  that  continuous  presentation  of  scenes  of  war  and  death  on  the  evening 
news  accompanying  dinner  was  a  major  force  in  shaping  the  policy  which  led  to  disengagement  in 
Vietnam.  And  my  Russian  friend  (whom  I  am  now  free  to  know  and  work  with)  tells  me  that  the 
USSR  was  doomed  the  first  time  Russian  citizens  saw  the  Western  way  of  life  on  television  and 
found  that  their  government  had  been  consistently  lying  to  them.  The  visual  evidence  of  “the  way 
things  are”  transcends  even  long-held  opinions  about  “the  way  I  have  been  told  things  are.” 

Similarly,  it  has  been  writers  of  fiction  who  have  presented  an  “envisioned”  future  “outside  of  the 
boundaries”  of  today’s  realities.  Perhaps  they  are  most  likely  to  be  closer  to  the  things  which  lie  in 
our  future.  Today,  aircraft  simulators  present  pilots  with  extremely  realistic  presentations  of  flight. 
Technology  has  permitted  movements  from  the  first  simulators  used  for  pilot  training  in  the  World 
War  II  era,  through  the  more  sophisticated  devices  created  by  the  Naval  Special  Devices  Center  in 
the  late  1940’s  and  early  I950’s  to  the  combat  training  devices  pilots  use  to  prepare  themselves 
for  flying  the  high  performance  supersonic  aircraft  of  today. 
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Just  as  Jules  Verne  [2]  predicted  a  nuclear  submarine  many  decades  before  nuclear  energy  was 
even  conceived  of  by  science,  and  the  many  creators  of  the  character  of  Buck  Rogers  were 
correct  about  man’s  voyages  into  space,  Gene  Roddenberry,  the  creator  of  the  original  Star 
Trek”  might  have  presented  a  vision  of  how  warfare  may  be  conducted  in  the  future.  In  an 
episode  of  “Star  Trek”,  the  crew  of  the  Enterprise  finds  itself  in  orbit  around  a  planet  which  is  at 
war  with  a  neighboring  planet.  In  this  episode,  the  war  is  fought  on  computers:  moves  are 
programmed  into  the  computers  of  both  combatants,  casualties  are  computed,  and  each 
government  responds  by  ordering  the  proper  numbers  of  people  to  be  killed.  While  we  might  not 
carry  realism  as  far  as  that,  we  can  now  create  very  realistic  displays  of  combat  which  present 
players  with  the  illusion  of  direct  personal  involvement  in  a  field  action.  We  can  bring  together  in 
a  virtual  network;  many  individual  weapon  simulators  and  devices  which  add  to  the  verisimilitude 
of  tactical  situations.  Communications  and  display  have  advanced  to  the  point  where 
Roddenberry’s  vision  can  be  implemented! 

But  there  is  a  further  possibility  in  the  immediate  future  that  can  do  even  better.  At  the  Vancouver 
World’s  Fair  of  1984,  the  story  of  the  development  of  British  Columbia  and  its  natural  resources 
was  presented  in  the  B.C.  Pavilion.  One  of  the  vignettes  (for  which  there  was  always  an  extended 
queue)  had  to  do  with  Indian  tribes  and  their  lives  before  the  settlers  came.  I  sat  in  my  seat  facing 
a  large  stage  shielded  by  curtains.  The  curtains  parted  and  an  Indian  village  was  revealed  complete 
with  a  complement  of  teepees,  a  lake,  and  groups  of  people  who  inhabited  that  area.  An  old 
Indian  (a  tribal  elder)  came  on-stage  and  told  the  story  of  the  village  and  life  there  dunng  the  days 
before  settlement.  At  the  end  of  his  story,  all  of  the  other  people  walked  off  the  stage,  and  only  he 
was  left.  When  he  had  finished  his  story,  a  canoe  came  floating  in  from  the  rear  of  the  stage  and 
came  to  a  stop  in  front  of  him.  As  he  spoke  about  the  disappearance  of  the  Indian  way  of  life  and 
of  the  people  who  fulfilled  the  expectations  of  that  life,  he  slowly  climbed  an  invisible  ladder,  sat 
down  inside  the  canoe,  and  floated  off  at  the  rear  of  the  stage  as  the  village  and  the  scenery  ALSO 
disappeared  from  view.  It  was  only  after  he  had  vanished  and  the  empty  stage  revealed,  that  the 
audience  became  aware  it  had  been  watching  a  holographic  spectacle  so  realistic  that  it  had  been 
mistaken  for  a  live  performance!  We  had  seen  “virtual  reality”  made  possible  holographically  to  a 
large  audience  who  were  so  immersed  in  the  illusion  that  they  felt  themselves  part  of  the 
spectacle. 

Imagine  what  can  be  done  today  with  computer  generated  virtual  reality!  Together,  linked 
interactive  communication  networks  using  distributed  simulations  blended  together  in  computer- 
driven  three-dimensional  (or  even  holographic)  displays  can  make  it  possible  to  immerse 
individuals  in  simulated  battle  so  realistically  that  their  responses  can  be  used  to  provide  highly 
reliable  indications  of  an  actual  battle  outcome.  The  capability  to  perform  parallel  computer 
processing  at  high  speeds  can  provide  seamless  simulations  of  sufficient  dimensionality  to  drive 
virtual  reality  displays  which  place  players  inside  of  “a  world  that  could  be.”  Field  exercises  or 
experiments  would  provide  confirmation  of  the  simulation  outcome  rather  than  basic  information 
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from  which  a  simulation  could  be  constructed.  Realism  can  be  achieved  through  creation  of  battle 
reality  sufficiently  well  to  generate  believable  responses  appropriate  to  the  situations  created. 

Under  such  circumstances,  the  purposes  for  which  CDEC  was  established  40  years  ago  can  be 
fulfilled  in  many  different  localities;  and  the  need  to  set  aside  large  numbers  of  troops  dedicated  to 
performing  experimentation  missions  becomes  much  less  necessary.  While  there  will  likely  need  to 
be  some  specially-instrumented  locations  specific  to  the  purpose  of  test  and  evaluation  of  real 
equipment  by  real  soldiers  on  a  real  terrain,  the  majority  of  that  work  will  most  likely  be  possible 
at  control-room  kinds  of  locations  distributed  throughout  the  United  States. 

In  short;  We  will  have  achieved  the  capability  to  perform  continuous,  controlled  experimentation 
in  an  orderly  exploration  of  the  effect?  of  changing  equipment  and  tactics  on  battle  outcomes! 

As  for  the  CDEC  I  knew  and  loved,  perhaps  an  old  Latin  phrase  may  be  appropriate:  “Sic  transit 
gloria  mundi!”  For  those  of  you  who  are  not  Latin  scholars,  it  means:  “Thus  passes  the  glory  of 
the  world” 


Henry  C.  Alberts 

Bethany  Beach,  Delaware  and  Fort  Belvoir  Virginia 
October,  1996 
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Abstract 

This  report  outlines  methods  for  estimating  and  comparing  the  Loss  Exchange  Ratio  (LER)  output 
of  computer  combat  simulations,  and  develops  methods  to  establish  a  priori  the  number  of  simulation 
runs  reauired  to  detect  a  change  in  the  parameters  of  the  simulation  of  a  given  size. 

The^Loss  Exchange  Ratio  (LER)  is  a  widely  used  and  widely  accepted  summa^  statistic  for  a  sim¬ 
ulation  run  involving  force-on-force  combat  models.  The  LER  is  surprisingly  variable  -  multiple  runs  of 

the  same  scenario  produce  a  large  range  of  LER.  ,  . 

We  assert  here  that  these  loss  exchange  ratios  are  skew  stochastic  random  v^ables,  and  J 

are  well  modeled  by  the  inverse  gaussian  (IG)  distribution.  We  di^u^  techmcal 
the  inverse  gaussian  for  models  over  other  distributions,  particularly  the  log-normal  distribution. 

tkb  IG  stooh^lio  »<Kl.I  allows  US  U,  d.w=Iop  »..l.«Is  I« 

the  parLeters  of  this  distribution,  using  its  known  sampling  distributions.  We  also  mhent  the  pr^^ 
JtatStical  tests  for  hypothesis  testing.  FinaUy,  we  are  able  to  determine  a  pnon  the  number  of 
runs  necessary  to  detect  a  change  in  the  distribution  of  a  given  size.  This  is  a  particularly  ^ 

given  the  increased  reliance  of  the  Army  on  these  simulation  models  to 

We  discuss  how  these  simulations  test  fit  into  the  larger  scheme  of  procurement  and  doctrine 

mustrate  with  data  sets  from  both  the  JANUS  and  CASTFOREM  shnula^ns.  ^  particular,  we 
find  that  the  use  of  the  IG  model  allows  us  to  make  more  powerful  conclusions  about  the  ^ta. 

th,  IG  U  .  good  mod.1  for  d^cibiug  tte  r^iubiUt,  ol  LER  wi.h  uuofol  »r»».t.ou 
and  testing  properties,  and  recommend  its  adoption.  We  sketch  several  other  promismg  areas  for  research 
which  follow  from  the  adoption  of  this  model 

Key  Words:  Sample  size,  loss  exchange  ratios,  inverse  gaussian  random  variables,  JANUS,  CASTFOREM, 
simulation,  design  of  experiments 
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Consider  two  systems  which  are  being  considered  for  acquisition.  How  does  one  tell  if  they  are  worth  the  cost 
of  acquiring  them,  or  what  their  benefits  are?  The  question  is  particularly  difBcult  if  the  systems  are  from 
different  battlefield  operating  systems,  say  an  air  defense  weapon  system  and  a  communications  ^stem. 

One  strategy  for  comparing  these  ^sterns  is  to  model  their  characteristics,  and  add  them  to  an  existing 
“base  case”  force  model.  For  example,  we  may  have  a  force  model  which  represents  a  battalion  task  force. 
We  adjust  the  model  to  reflect  the  addition  of  new,  competing  weapons  systems.  These  new  force  models 
are  xised  in  a  suite  of  scenarios  which  are  executed  in  a  combat  simulation,  say  JANUS  or  CASTFOREM. 
The  results  of  the  simulation  with  the  new  force  packages  are  compared  to  each  other  and  to  the  base  case. 
Inferences  are  drawn  about  their  relative  merits.  These  merits,  together  with  the  costs  of  the  systems,  can 
form  the  basis  for  rational  choices  using  a  cost-benefit  analysis. 

Such  comparisons  are  not  limited  to  system  acquisition:  doctrine  and  force  structures  can  also  be  modeled 
and  compared  using  this  simulation  approach. 

A  related  problem  asks,  what  are  the  spedfications  which  should  be  reqmred  for  a  new  system?  One 
approach  is  to  construct  a  model  which  allows  varying  capability  in  the  new  weapon  ^stem,  and  to  simulate 
at  various  levels  of  this  capability.  One  then  chooses  a  response  from  the  simulation,  and  constructs  a  model 
of  the  response  as  a  function  of  the  level  of  the  capability.  It  is  possible  to  construct  response  surface  models 
which  examine  the  effects  of  making  multiple  changes  to  the  base  force  model  simultaneously.  These  models 
help  decide  how  much  and  which  capability  to  buy.  They  also  allow  exploration  of  the  interactions  between 
capabilities  and  the  identification  of  any  resulting  S3m.ergies. 

This  methodology  requires  us  to  select  the  outputs  of  these  simulations  for  comparison.  It  then  requires 
a  statistically  valid  means  of  modeling,  estimating,  and  comparing  these  responses. 

One  of  the  conventional  summaiy  statistics  for  a  combat  simulation  nm  is  the  loss  exchange  ratio  (LER) 
,  which  is  the  ratio  of  enemy  losses  to  friendly  losses.  While  this  statistic  suffers  from  all  the  difficulties 
associated  with  siunmarizing  a  very  complex  battle  with  one  number,  it  has  found  wide  acceptance  in  the 
operations  research  community. 

We  will  use  the  LER  as  the  response  variable  for  the  purposes  of  this  discussion.  We  note  that  the 
methodology  is  general,  and  can  be  applied  to  other  skewed,  non-n^ative  measures  of  effectiveness. 

This  paper  has  the  foflowing  structure.  In  section  2,  we  discuss  loss  exchange  variables  and  possible 
models,  adopting  the  inverse  gaussian  model.  In  section  3,  we  discuss  estimation  of  LE21  parameters  using 
the  inverse  gaussian  model.  In  section  4,  we  discuss  hypothesis  testing.  In  section  5,  we  discuss  sequential 
testing  methods  of  LER.  Next,  in  section  6,  we  examine  the  power  of  these  tests,  and  propose  a  simffiation 
method  for  determining  the  appropriate  number  of  runs  for  a  simulation.  We  then  look  at  two  different  sets 
of  simulation  results  in  section  7.  We  close  with  conclusions  and  recommendations.  A  primer  on  the  inverse 
gaussiain  distribution  is  available  from  the  author,  and  is  omitted  here  in  the  interests  of  conserving  space. 
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Figure  1:  Histogram  for  loss  exchange  ratios  for  80  simulations  of  a  scenario  simulated  using  CASTFOREM. 
Data  provided  by  TRAC-~WSMR. 

Loss  Exchange  Ratios 


The  loss  exchange  ratio  is  a  widely  used  summary  statistic  for  combat  models.  It  has  thwreti^l  underpm- 
nings  in  the  work  of  Frederick  Lanchester,  and  his  deterministic  differential  equation  models  of  combat. 


It  is  well  known  that  the  output  of  a  computer  simulation  package  such  as  JANUS  ot  CASTFOREM  is 
variable.  The  exact  same  scenario  can  be  simulated  repeatedly  on  these  models,  and^erent  ^me  im^ 
strikingly  differait  -  outcomes  m.y  result.  For  etMuple.  the  boxplot  in  Figure  1  sh^  thel^of  » 
runs  of  the  same  scenario  on  the  same  computer  usmg  the  same  simulation  packap,  CASTFO]^  . 
maximum  LER  was  4.5,  while  the  minimum  was  0.69.  The  median  was  1.5,  while  the  mean  was  1.69788. 


The  variability  and  skewness  in  these  data  argue  strongly  against  using  the  single  suirmary  statistic^ 
average  LER.  The  data  needs  to  be  described  not  only  with  a  measure  of  location,  but  also  with  measure  o 
its  dispersion  and  shape.  For  appropriate  statistical  description  and  analysis,  we  require  a  statisti^  model. 
T.af>VinE  such  a  model,  we  can  not  compare  the  outputs  of  the  competmg  simulations:  we  can  not  detenmne 
if  the  difference  in  response  is  due  merely  to  chance. 


Models 

There  are  several  possible  models  for  modeling  non-negative  skew  data.  The  log-normal,  gamma,  weibull, 
and  inverse  gaussian  distributions  immediately  suggest  themselves. 

We  desire  our  model  to  have  several  properties.  First,  the  model  must  fit  the  data  well.  Second,  the 
distribution  of  the  maximum  likelihood  estimators  (MLEs)  for  the  model  parameters  should  be  Imown,  an 
tractable.  As  a  miniTnum,  we  should  be  able  to  find  the  MLEs  without  resorting  to  numerical  methods. 
Third,  the  theory  of  estimation  and  testing  for  the  model  should  be  well  developed.  Fourth,  the  parameters 
of  the  model  should  be  easily  interpretable. 

We  exclude  the  gamma  and  the  weibull  distributions  for  failing  to  have  the  second  property.  The  MLEs 
for  these  distributions  can  not  be  found  explicitly,  and  require  numerical  approximation.  The  distribution 
for  the  MLEs  is  not  tractable. 
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Figure  2:  A  “QQ”  plot  of  the  logarithm  of  the  WSMR  data  against  normal  quantiles. 

The  log-normal  is  a  possible  model.  The  distribution  of  the  MLE’s  is  known,  and  parallels  the  standard 
normal  distributions.  However,  there  is  a  real  practical  difficulty  which  arises  from  the  logarithmic  transfor¬ 
mation  of  the  data  necessary  to  conduct  statistical  testing.  Statements  about  the  mean  of  the  transformed 
variable  are  not  statements  about  the  mean  of  the  original  variable,  but  rather  the  median  of  the  original 
variable.  The  mean  of  the  original  variable  is  a  function  of  both  the  mean  and  variance  of  the  transformed 
variable.  A  direct  test  for  equality  of  means  of  the  original  variable  is  awkward  at  best.  Similarly,  statements 
about  the  variance  of  the  original  variable  are  complicated  by  the  fact  that  it  is  a  function  of  both  the  mean 
and  variance  of  the  transformed  variables. 

We  prefer  a  model  which  fits  well  and  does  not  require  transformation,  so  that  the  parameters  are 
immediately  useful.  As  we  discuss  in  the  next  section,  we  choose  the  inverse  gaussian  distribution. 

Why  Inverse  Gaussian? 

The  inverse  gaussian  distribution  is  a  positive  skewed  distribution  with  two  parameters,  and  A:  /x  is  the 
mean  of  the  distribution,  and  A  is  a  shape  parameter.  The  MLE)s  are  known  and  the  distributions  of  the 
MLE^  involve  only  the  inverse  gaussian  distribution  and  the  Chi-'Squared  distribution.  Statistical  tests  for 
equality  of  and  A  involve  only  the  t-<listributions  and  the  F-distributions. 

The  inverse  gaussian  distribution  fits  the  data  sets  we  display  in  this  report  at  least  as  well  as  the  log¬ 
normal.  The  difference  between  the  two  is  in  the  behavior  of  the  left  tail,  where  the  log-normal  tends  to 
imderestimate  the  quantiles. 

For  example.  Figure  2  is  a  “QQ”  plot  of  the  data  set  from  Figure  1  against  the  log-normal  distribution. 
Notice  that  the  data  is  more  heavy  tailed  to  the  right  than  the  normal  quantiles  would  suggest.  Similarly, 
Figure  3  shows  that  the  histogram  for  the  transformed  data  is  still  skewed  to  the  right.  Figure  4  shows  the 
density  for  the  inverse  gaussian  distribution  with  MLEs,  and  the  model  fits  the  tails  better. 

The  graphical  evidence  in  Figures  2,  3,  and  4  is  supported  by  more  formal  goodness  of  fit  testing  using 
the  Wilks-Shapiro  statistic. 

We  make  the  assumption  for  the  balance  of  this  paper  that  the  LER  data  is  well  modeled  by  the  inverse 
gaussian  distribution,  with  parameters  given  by  the  maximum  likelihood  estimates. 
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Figure  3:  A  histogram  of  the  logarithm  of  the  WSMR  base  data  set.  A  non-parametiic  smooth  1^  been 
apphed  to  the  data.  Notice  that  the  data  is  still  skew  to  the  right,  suggestmg  that  the  log-normal  m 
inappropriate. 
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Figure  4:  ffistogram  of  the  WSMR  base  data  with  best  fitting  IG  density- 


Estimation 


The  MLE  estimate  of  the  mean  of  the  IG  distribution  is  the  sample  average  and  its  distribution  \s  X  ^ 
This  allows  us  to  construct  confidence  intervals  for  the  mean  of  the  LER.  These  confidence 
intervals  are  more  accurate  than  ones  based  on  the  asymptotic  application  of  the  law  of  large  numbers, 
because  the  data  is  more  heavy  tailed  than  the  normal  distribution.  Application  of  the  standard  x  ±  k& 
results  in  an  unnecessarily  large  confidence  interval  for  the  mean. 

The  shape  parameter  A  has  MLE  given  by 


This  estimator  is  a  function  of  the  sufficient  statistic  V . 

The  distribution  of  7  ~  This  allows  confidence  intervals  to  be  constructed  for  A  based  on  the 

distribution. 

For  further  details,  the  reader  is  referred  to  the  primer  in  the  appendix. 

The  key  point  is  that  the  distributions  of  these  MLEs  involve  only  the  IG  and  the  distributions:  they 
are  very  tractable.  The  actual  estimates  are  easily  computed. 

Estimating  the  shape  parameter  seems  to  be  particularly  noteworthy,  as  the  skewness  and  variance  of 
the  LER  are  not  routinely  reported.  The  variance  of  the  IG{fi,  A)  distribution  is  given  by  so  as  the 

shape  parameter  increases,  the  variability  decreases. 

Closed  form  expressions  for  confidence  intervals  for  fi  and  A  axe  available  in  Chhikara  and  Folks  [1989], 
and  again  are  based  on  quantiles  of  standard  distributions. 

These  confidence  intervals  are  narrower  than  ones  based  on  the  asymptotic  normal  distribution.  For 
example,  consider  the  WSMR  base  data.  A  95%  confidence  interval,  based  on  a  standard  normal  distribution 
approximation  for  the  mean  which  follows  from  the  strong  law  of  large  numbers,  we  obtain 

fjL  e  (1.53895, 1.8568)  ==  (m  ±  1.96cr/v^  (2) 

Using  the  IG  model  and  the  formulas  given  in  Section  9,  we  obtain  a  tighter  confidence  interval  for  /z,  which 
also  recognizes  the  skew  nature  of  the  data: 

/x€  (1.56257,1.85883)  (3) 

As  a  result,  we  have  a  more  precise  estimate  of  the  mean,  given  the  available  data. 

We  obtain  similar  confidence  intervals  for  the  A  parameter. 

We  mention  in  passing  that  we  are  averse  to  confidence  intervals  for  the  mean,  preferring  instead  to 
assume  a  Bayesian  model  with  a  nominformative  prior  distribution,  which  results  in  probability  intervals  for 
the  mean.  Such  Bayesian  methods  are  also  outlined  in  Section  9. 
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Hypothesis  Testing 


The  uniform  most  powerful  unbiased  tests  for  the  equality  of  two  inverse  gaussian  population  means  ^e 
known.  We  consider  here  the  case  where  neither  the  mean  nor  shape  parameter  is  known.  References  tor 
the  other  cases  are  in  Section  9. 


The  rejection  region  is  a  function  of  the  sufficient  statistics  for  each  sample,  X  and  V,  and  the  critical 
points  are  given  by  the  t  distribution.  Details  are  given  in  the  primer  in  the  appendix,  Section  . . . 

The  uniform  most  powerful  test  for  the  equality  of  the  shape  parameters  is  a  function  of  the  si^cient 
statistics  V  for  each  sample,  and  follows  the  F  distribution.  Again,  details  are  m  the  primer  m  Section  . . . 

These  tests  allow  us  to  test  if  the  means  and  shapes  of  two  samples  are  statistically  equivalent.  In 
the  context  of  our  problem  of  comparing  the  output  of  two  combat  simulations,  they  allow  us  to  test  the 
hypothesis  that  the  outputs  came  from  identical  processes. 

Moreover,  since  these  tests  axe  based  on  well  fit  distributions,  they  axe  more  powerful  than  using  asymi> 
toticaUy  based  tests.  We  see  in  the  examples  where  these  tests  allow  us  to  show  statistically  different  results, 
where  the  asymptotic  methods  do  not. 


The  result  is  that  we  can  make  more  powerful  inference  based  on  the  simulations  we  do  nm,  which 
saves  us  computational  expense  and  results  in  more  effident  use  of  the  sraiulations  we  do  run.  For  large 
gTYiiilatinnc!^  this  Can  result  in  agnificant  economies. 


Significance  tests  also  exist  for  one  sided  and  two  sided  tests  for  the  mean  with  A  both  kno^  and 
unknown.  Significance  tests  also  are  known  for  the  equality  of  A  with  the  mean  both  known  and  unknovm. 
Additionally,  there  axe  two  sample  versions  of  the  above  tests.  These  cover  the  usual  possitahties  invoRe 
only  the  JG,  t,  and  F  distributions,  and  allow  simple  implementation  of  exaet  tests.  These  tests  are 

outlined  in  Chhikara  and  Folks  [1989]. 


For  example,  consider  the  WSMR  base  data.  We  wish  to  test  the  hypothesis  that  M  -  1-5  ag^  the 
alternate  hypothesis  that  p  ^  1.5.  The  test  statistic,  from  Section  9,  Mows  ^ 

degrees  of  freedom.  We  have  79  data  points,  so  our  critical  value  is  tcrit  -  1.99045  at  the  0.05  sigmficance 

level. 


We  compute  the  value  of  the  statistic  and  obtain: 


t  = 


y/n  —  l{X  —  fip) 

iMiy/xv 


=  3.032  >  1.99 


(4) 


We  reject  the  hypothesis  that  p  =  1.5.  This  accords  with  the  results  of  our  previous  section,  where  1.5  was 
not  included  in  our  95%  confidence  interval  for  fi,  pven  in  Equation  3. 
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Figure  5:  A  graphical  description  of  the  SPRT. 


Sequential  Testing 

It  is  possible  to  test  if  the  means  of  two  combat  models  are  equivalent  using  sequential  methods.  In  these 
methods,  one  does  not  predetermine  the  number  of  simulation  runs,  but  rather  samples  until  one  can  make 
a  decision.  The  classic  method  is  the  sequential  probability  ratio  test. 

Wald  conjectured  [Wald,  1947]  and  later  proved  [Wald  and  Wolfowitz,  1948]  that  the  sequential  proba¬ 
bility  ratio  test  (SPRT)  is  optimal  for  deciding  between  two  point  hypotheses  in  the  sense  that  the  expected 
number  of  points  sampled  before  a  decision  could  be  reached  was  minimized  with  the  SPRT.  A  precise 
statement  of  these  optimality  properties  of  the  SPRT  in  a  decision  framework  can  be  found  in  [Ferguson, 
1967]. 

The  SPRT  considers 

A  /(Ai,  A2,.-.,Xn|^l)  _  TT  f(Xi\6i)  /gx 

where  f{x\6)  is  the  joint  or  marginal  density  as  appropriate.  The  SPRT  accepts  Ho  :  0  =  6o  ^  An  <  A, 
accepts  Ha  •  0  =  0i  An  >  H  and  otherwise  continues  sampling.  This  is  illustrated  in  Figure  5,  with 
A  =  —3  and  R  =  3,  where  the  null  h3q)othesis  would  have  been  rejected  at  observation  number  4. 

In  practice,  we  work  with  the  Ipg-likelihood,  or  In(An),  which  results  in  a  cumulative  sum.  We  accept, 
reject,  or  continue  sampling  based  on  the  value  of  this  cumulative  sum.  As  we  have  written  it,  the  log- 
likelihood  ratio  will  have  a  negative  expected  value  when  the  process  is  in-control.  When  the  process  is 
well  modeled  by  the  alternate  hypothesis,  the  log-likelihood  ratio  will  have  a  positive  expected  value.  As  a 
result,  when  the  process  is  in-control,  the  sum  tends  downward.  When  the  process  is  out-of-control  at  the 
alternative  distribution,  the  sum  tends  upward.  When  the  sum  is  above  a  certain  limit,  we  have  evidence 
in  favor  of  the  alternative  hypothesis.  When  the  sum  is  below  a  certain  limit,  we  decide  in  favor  of  the  null 
hypothesis.  When  the  sum  is  in-between  the  limits,  we  continue  to  sample. 

In  the  present  context,  we  would  apply  the  SPRT  as  follows.  We  would  first  have  our  estimate  of  the  base 
case  parameters,  which  would  determine  Bq.  We  would  then  select  the  shift  in  the  parameter  for  which  we 
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desire  maximiom  sensitivity.  For  example,  say  our  estimate  of  the  mean  for  the  base  ^  ^^s  fM)  '  J 

further  that  we  wished  maximum  power  to  detect  if  the  mean  had  shifted  to  =  2.00.  We  would  construct 
the  SPRT  with  those  two  point  hypotheses,  and  sample  until  we  reached  a  conclusion. 

The  values  of  the  upper  and  lower  limits  for  the  SPRT  are  set  after  considering  the  desired  perform^ce 
of  the  test  in  terms  of  type  I  (a)  and  II  (/?)  errors.  Exact  methods  are  available,  but  the  usual  approximation 

is  to  set  A  =  a/(l  -  /?)  and  J5  =  (1  -  a)/p. 

By  using  sequential  methods,  one  is  guaranteed  to  reach  a  decision,  and  to  do  so  in  the  few^t  av^age 
number  of  simulations.  This  avoids  the  situation  where  one  runs,  say  the  usud 

the  null  hypothesis,  yet  doesn’t  know  if  5  more  trials  would  have  resulted  m  the  rejection  of  the  null.  Thi 
avoids  the  need  to  do  the  power  calculations  discussed  next. 


Number  of  runs 


To  determine  the  number  of  simulation  runs  necessary  to  detect  a  difference  of  pmameter  of  a  given  sire 
a  given  probability,  the  usual  course  is  to  use  the  power  function  for  the  test.  The 

inverse  gaussian  distribution  test  statistics  are  not  known,  however,  because  the  non-central  distnbutioi^ 
of  the  tit  statistics  axe  intractable.  In  this  section,  we  sketch  an  approximate  method  for  determmmg  the 
number  of  simulation  runs  necessary. 

We  assume  that  we  have  historical  data  on  the  current  model,  with  summaiy  statidics  given  by  X,  Vx, 
and  nx-  This  corresponds  to  the  knowledge  we  would  have  about  the  current  model  after  nx  runs. 

First,  we  need  to  specify  two  models  and  error  probabilities:  the  current  model,  the  sm^^  model  ^OTge 
that  we  wish  to  det^,  tlm  probability  of  type  1  error  (reject  the  null  when  it  is  true)  and  the  probabihty  of 
type  2  error  (accept  the  null  when  it  is  false). 

For  example,  we  could  identify  om  current  model  as  represented  by  the  WSMR  b^e  case  ^ta.  We  want 
the  probabiUty  that  we  say  incorrectly  say  that  the  model  has  Ranged,  when  it  remains  ^ 

than  5%.  We  desire  to  be  95%  sure  that  we  detect  a  model  shift  to  n  =  2.00,  with  A  remarmng  constant.  In 
other  words,  we  want  a  =  0.05,  /?  =  0.05.  How  many  trials  should  be  run? 

Our  setup  consists  of  two  samples,  one  known  and  one  to  be  drawn.  Here  the  known  sample  is  the  WSMR 
data.  We  want  to  know  how  large  the  sample  should  be  for  the  one  remaining  to  be  awn. 

Under  the  null  hypothesis  that  the  means  are  equivalent,  the  distribution  of  the  t^t  stati^ic  T  ^ven  by 
Equation  ??  is  known  to  have  the  t  distribution.  As  a  result,  we  can  compute  our  cntic^  value  for  the  t^t 
statistic.  For  the  WSMR  data,  with  its  large  sample  are,  we  can  approximate  the  critical  value  by  Z.W, 
regardless  of  the  size  of  the  second  sample. 

Under  the  alternate  hypothesis,  fx  =  2.00.  We  can  draw  samples  of  sire  n  repeatedly  compute  T  ^d 
find  the  approximate  probability  that  T  <  tcrit-  This  ^ves  us  an  empmcal  estimate  for  /?,  the  probabihty 
that  we  don’t  detect  the  model  shift  to  M  =  2.00  when  it  has  occurred. 

Routines  for  these  simulations  are  easily  implemented.  One  sudi  LISP  implementation  is  available  from 
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the  author. 


For  example,  we  return  to  the  WSMR  base-case  data.  How  many  runs  do  we  need  to  make  to  be  95% 
sure  to  detect  a  change  this  large? 

We  set  n  =  200.  Of  a  thousand  trials,  977  have  a  value  of  T  greater  than  2.00.  We  set  n  —  180.  Then 
965  of  a  thousand  trials  have  a  value  of  T  >  2.  We  set  n  =  150,  and  find  937  of  a  thousand  trials  have  a 
value  of  T  >  2.00.  We  could  apply  a  bisection  method  or  a  simple  interpolation  to  find  that  we  need  to  set 
n  «  165  to  achieve  our  desired  design. 

We  note  that  these  simulations  take  a  few  minutes  to  run  on  a  personal  computer,  but  are  much  quicker 
than  the  corresponding  JANUS  or  CASTFOREM  simulations. 

We  have  found  the  simulation  community  is  generally  unaware  of  the  large  number  of  simulation  runs 
required  to  have  high  power  for  hypothesis  tests  when  the  underlying  distribution  is  as  variable  and  skew  as 
the  distribution  of  LER. 


Examples 


We  present  two  short  examples  to  support  the  ideas  in  this  report.  The  first  data  set  was  provided  by 
Mr.  Dave  Durda  of  TRAC-WSMR,  and  is  called  the  WSMR  data  throughout  this  paper.  The  second  was 
provided  by  Mr.  Tom  Herbert  of  the  RAND  corporation,  and  is  called  the  RAND  data. 


WSMR 

TRAC-WSMR  is  responsible  for  stochastic  combat  simulation  models.  One  of  their  models  is  CASTFOREM. 
There  has  been  discussion  recently  about  adjusting  the  way  that  CASTFOREM  assesses  damage  to  systems 
represented  in  the  model.  One  proposal  was  to  model  degraded  states,  where  instead  of  a  system  having 
a  binary  state  space  (“killed”  or  ”not  killed”),  the  system  could  take  on  one  of  several  states  representing 
reduced  capability. 

Three  new  types  of  rules  were  proposed,  along  with  one  base  case.  We  call  them  the  base  case,  and  cases 
one  through  three.  There  was  interest  in  whether  or  not  these  different  rules  affected  the  performance  of 
CASTFOREM,  and  if  so,  by  how  much. 

We  were  provided  with  the  results  of  260  simulations  of  the  different  rules  in  CASTFOREM  using  a 
standard  scenario.  The  base  case  and  cases  two  and  three  were  run  80  times  each  through  the  standard 
scenario.  The  first  base  case  was  only  run  20  times.  We  caU  the  base  case  data  ‘WSMR”,  the  old  rule  data 
“WSMRl”,  and  the  two  other  methods  “WSMR2”  and  “WSMR3”. 

Boxplots  for  the  LERs  from  the  simulation  for  each  of  the  models  are  in  Figure  6.  We  see  immediately 
from  the  boxplots  that  the  old  rules  clearly  produce  different  results  from  the  three  new  rules;  the  graphical 
evidence  is  compelling  and  sufficient.  We  move  on  to  the  question  of  whether  or  mot  the  three  new  methods 
produce  similar  results. 
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Figure  6:  Boxplot  of  the  four  WSMR  data  sets.  Prom  left  to  right,  they  are  the  base  case,  the  old  binary 
rules,  and  two  modifications  to  the  base  case.  Source;  White  Sands  Missile  Range,  1996. 


The  data  sets  were  each  found  to  be  well  modeled  by  the  IG  distribution. 

We  compute  the  confidence  intervals  for  the  means  of  WSMR,  WSMR2  and  WSMR3,  and  obtain: 

(J'WSMR  S  (1.5635,1,858)  (®) 

fJ-wsMRi  €  (1.410,1.667)  (7) 

l^wsMBZ  €  (1.3833,1.6408)  (^) 

It  appears  from  the  confidence  intervals  that  the  WSMR2  and  WSMR3  means  are  indistinguishable.  Can 
they  be  distinguished  from  the  base  case? 

Applying  the  iwo  sample  test  developed  earlier,  we  find  that  WSMR  and  WSI^  ^e  not  statisti^lly 
signifi<^ntly  different,  as  the  value  of  the  resulting  t  statistic  is  only  t  =  1.7468.  The  t^  for  eq^ty  of 
means  between  WSMR  and  WSMR3,  however,  has  a  t  statistic  value  of  t  =  2.071  w^ch  is  si^^nt  at 
the  0.05  level.  We  conclude  that  the  WSMR3  set  of  rules  for  degraded  states  has  a  statistically  significantly 
different  impact  on  LER  than  the  WSMR  rules. 

We  note  in  conclusion  here  that  if  we  naively  apply  the  two  sample  t  test  which  would  foUow  from 
the  inappropriate  assumption  that  WSMR  and  WSMR3  were  normally  distributed  or  ^ 
approximation  based  on  an  application  of  the  law  of  large  numbers,  we  wo^d  obt^  t  -  1  62,  and  we 
would  not  detect  the  model  differences.  Our  methods  are  more  powerful  than  the  a^ptotic  normal 

approximation. 


RAND 

We  have  a  second  group  of  data  sets,  provided  by  RAND.  This  data  came  from  trials  of  the  of  a  new 

weapon  wstem.  Three  scenarios  were  run.  In  the  base  case,  a  blue  battaUon  task  force  attacked  a  defei^g 
red  battalion  task  force.  In  the  second  case,  the  attackers  were  augmented  with  a  new  weapons  ^stem 
In  the  third  case,  the  attackers  were  augmented  with  two  new  weapons  syste^.  The  simulators  sought 
to  demonstrate  that  the  LER  was  significantly  better  (from  the  blue  point  of  view)  with  the  new  weapons 
systems. 


Hoa*i 


Figure  7:  Boxplots  for  the  RAND  data.  Source:  RAND  Corporation,  1996. 


m  -  Hodel  1 


Figure  8:  Histogram  of  the  first  RAND  data  set  with  fitted  lO  density. 


RAND  conducted  thirty  runs  of  eadi  case. 

Boxplots  for  the  three  cases  are  presented  in  Figure  7,  From  the  boxplots,  we  see  again  that  no  formal 
statistics  are  necessary  to  see  that  the  new  weapons  systems  help  the  blue  force.  We  can  obtain  confidence 
intervals  for  /z  and  A  to  emphasize  the  point. 

We  prefer  to  dwell  on  a  different  point:  despite  the  difference  in  combat  simulations  between  JANUS 
and  CASTFOREM,  both  produce  distributions  of  LER  which  are  well  modeled  by  the  inverse  gaussian 
distribution.  We  present  some  graphical  evidence  in  Figures  8,  9  and  10.  Formal  testing  using  Wilks-Shapiro 
and  Kolmogorov-Smirnov  tests  supports  this  graphical  evidence. 


Conclusions  and  Recommendations 


This  is  a  quick  summary  of  the  main  points  of  this  paper. 
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Figure  9:  ffistogram  of  the  second  RAND  data  set  with  fitted  IG  density. 
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Figure  10:  Histogram  of  the  third  RAND  data  set  with  fitted  IG  density. 


Conclusions 

0  inverse  gaussian  distribution  fits  LFR  data  well. 

•  The  inverse  gaussian  distribution  provides  a  complete  theory  of  estimation,  h^oth^s  testmg,  and 
design  of  simulation  studies  for  the  use  of  the  analyst.  This  theory  is  l^gely  ba^  on  standard 
distributions,  such  as  the  t,  normal,  and  distributions,  which  are  accessible  to  analysts. 

.  Methods  based  on  the  inverse  gaussian  distribution  are  more  powerful  for  analysis  of  LER  problems 
than  methods  based  on  asymptotic  normality. 

.  Ustog  the  methods  of  this  peper.  it  U  possible  to  easily  and  accoratjr  .PPr^o«e  *0  f 

simulation  runs  necessary  to  detect  a  diaage  in  the  mean  or  shape  of  the  distribution  of  LER  results. 


Recommendations 

Loss  exchange  ratios  should  be  modeled  as  inverse  gaussian  random  variables  in  studies  where  high  statistical 
precision  is  desired. 

Further  applications  of  this  model  should  be  studied.  One  promising  area  is  the  development  of  r^ession 
models  which  predict  the  LER  for  a  given  level  of  JANUS  or  CASTFOREM  parameter  associated  with  some 
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system  capability.  This  could  allow  the  acquisition  community  to  decide  how  on  a  desir^  level  of  capability 
before  setting  specifications  for  systems  procurement  and  design.  In  particular,  regression  models  based  on 
the  inverse  gaussian  distribution  with  several  predictors  seem  fruitful  for  future  study. 


References 

[  1  ]  Banerjee,  Asit  K.  and  G.  K.  Bhattachaiyya  (1979)  Bayesian  Results  for  the  Inverse  Gaussian  Distri¬ 
bution  with  an  Application.  Technometrics  Yol.  21(2).  pp.  247-251. 

[  2  ]  Bickel,  Peter  J.  and  Kjell  A.  Doksum.  (1977)  Mathematical  Statistics.  Englewood  Cliffe,  NJ:  Prentice 
Hall. 

[  3  ]  Chhikara,  R  S.  (1975)  Optimum  tests  for  the  comparison  of  two  inverse  Gaussian  distribution  means. 
Australian  Journal  of  Statistics.  Vol.  17.  pp.  77-83. 

[  4  ]  Chhikara,  R.  S.  and  J.  L.  Folks.  (1976)  Optimum  test  procedures  for  the  mean  of  first  passage  time 
in  Brownian  motion  with  positive  drift  (inverse  Gaussian  distribution).  Technometrics.  Vol.  19.  pp. 
189-193. 

[  5  ]  Chhikara,  R.  S.  and  J.  L.  Folks.  (1977)  The  Inverse  Gaussian  Distribution  as  a  Lifetime  Model. 
Technometrics  Vol.  19.  No.  4.  pp.  461-468. 

[  6  ]  Chhikara,  R.  S.  and  J.  L.  Folks.  (1989)  The  Inverse  Gaussian  Distribution.  New  York:  Marcel  Dekker. 

[  7  ]  Chhikara,  Raj  S.  and  Lrwin  Guttman.  ( 1982)  Prediction  Limits  for  the  Inverse  Gaussian  Distribution. 
Technometrics  Vol.  24.  No.  4.  pp.  319-324. 

[  8  ]  Desmond,  A.  F.  and  G.  R.  Chapman.  (1993)  Modeling  Task  Completion  Data  with  Inverse  Gaussian 
Mixtures.  Applied  Statistics.  Vol.  42.  No.  4.  pp.  603-613. 

[  9  ]  Dupuy,  Trevor  N.  (1987)  Can  We  Rely  Upon  Computer  Combat  Simulations?  Arrmd  Forces  Journal 
International  August,  pp.  58-63 

[  10  ]  Edgeman,  Rick  L.,  Robert  C.  Scott,  and  Robert  J.  Pavur.  (1988)  A  modified  Kolmogorov-Smirnov 
Test  for  the  Inverse  Gaussian  Density  with  Unknown  Parameters.  Communications  in  Statistics 
Simulations.  Vol.  17.  No.  4.  pp.  1203-1212. 

[  11  ]  Ferguson,  Thomas  S.  (1967)  Mathematical  Statistics:  A  Decision  Theoretic  Approach.  New  York, 
Academic  Press. 

[  12  ]  Folks,  J.  L  and  R.  S.  Chhikara  (1978)  The  Inverse  Gaussian  Distribution  and  its  Statistical  Appli¬ 
cation  -  A  Review.  Journal  of  the  Royal  Statistical  Society  B.  Vol.  40.  No.  3.  pp.  263-289. 

[  13  ]  Fries,  Arthur.  (1986)  Optimal  Design  for  an  Inverse  Gaussian  Regression  Model.  Statistics  and 
Probability  Letters.  Vol.  4.  pp.  291-294, 

[  14  ]  Geisser,  Seymour  (1993)  Predictive  Inference:  An  Introduction  New  York:  Chapman  &:  Hall. 


150 


[  15  ]  Helmbold,  Robert  L.  (1990)  Rates  of  Advance  in  Historical  Land  Combat  Operations.  Bethesda, 
Maryland;  US  Army  Concepts  Analysis  Agency. 

[  16  ]  IMSL,  Inc.  (1989)  MATH/LIBRARY:  FORTRAN  Svbroutines  for  Mathematical  Applications.  Edi- 
tion  1.1.  Hoiiston:  IMSL,  Inc. 

[  17  ]  '  Hughes,  Wayne  R,  editor.  (1984)  MUitary  Modeling.  Alexandria,  VA:  Military  Operations  Research 
Society. 

[  18  ]  Lanchester,  F.  W.  (1956)  Mathematics  in  Warfare.  In  The  World  of  Mathematics.  Vol.  4.  Edited  J. 
R.  Newman.  New  York:  Simon  and  Schuster,  pp.  2138  -2157. 

[  19  ]  Olwell,  David  H.  (1996)  Topics  in  Statistical  Process  Control  Ann  Arbor:  University  Microfilms. 

[  20  ]  Savage,  I.  R.  (1962)  Surveillance  Problems.  Naval  Research  Logistics  Quarterly.  Vol.  9. 

[  21  ]  Schroedinger,  E.  (1915)  Zur  Theorie  der  fall-  und  steigversuche  an  teilchen  mit  Brownscher  bewegung. 
Phys.  Ze.  Vol.  16.  pp.  289-295. 

[  22  ]  Taylor,  H.  M.  (1965)  Statistical  Control  of  a  Gaussian  Process.  Technometrzcs.  Vol.  9. 

[  23  ]  Taylor,  James  G.  (1981)  Force-on-Force  Attrition  Modeling.  Arlington,  VA:  Operations  Research 
Society  of  America. 

[  24  ]  Taylor,  James  G.  (1983)  Lanchester  Models  of  Warfare.  Volumes  I  and  II.  Arlmgton,  VA:  Operations 
Research  Society  of  America. 

[  25  ]  Tierney,  Luke.  (1990)  LISP-STAT:  An  Object  Oriented  Environment  for  Statistical  Computing  and 
Dynamic  Graphics.  New  York:  Wiley. 

[  26  ]  Tweedie,  M.  C.  K.  (1957a)  Statistical  properties  of  inverse  Gaussian  distributions  I.  Annals  of  Math- 
ematical  Statistics.  Vol.  28.  pp.  362-377. 

[  27  ]  Tweedie,  M.  C.  K.  (1957b)  Statistical  properties  of  inverse  Gaussian  distributions  H.  Annals  of 
Mathematical  Statistics.  Vol.  28.  pp.  696-705. 


[  28  1  Varian,  Hal  R.  (1975)  A  Bayesian  Approach  to  Real  Estate  Assessm^t.  In  Studies  in  Bay ^n 
Econometrics  and  Statistics  in  Honor  of  Leonard  J.  Savage  Eds.  Stephen  Feinberg  and  Arnold  ZeUner. 
Amsterdam:  North-Holland.  pp.  195-208. 


[  29  ]  Ventisel,  Ve.  S.  (1964)  Introduction  to  Operations  Research  Moscow:  Soviet  Radio  Publishing  House. 
[  30  ]  Wald,  Abraham  (1945)  Sequential  Tests  of  Statistical  Hypotheses.  Annals  of  Mathematical  Statistics. 
Vol.  16. 

[  31  ]  Wald,  Abraham.  (1947)  Sequential  Analysis  New  York;  Wiley. 

(  32  ]  Wald,  Abraham,  and  Jacob  Wolfowitz.  (1948)  Optimum  character  of  the  sequential  probability  ratio 
test.  Annals  of  Mathematical  Statistics.  Vol.  19.  pp.  326-339. 

1  33  1  Zellner,  Arnold.  (1986)  Bayesian  Estimation  and  Prediction  Using  Asymmetric  Loss  Functions. 
Journal  of  the  American  Statistical  Association.  Vol.  81.  No.  394.  pp.  446-451. 


151 


Intentionally  left  blank. 


152 


CASTFOREM  VERIFICATION  AND  VALIDATION  PROCESS 


Douglas  C.  Mackey 

TRADOC  Analysis  Center-White  Sands  Missile  Range 
White  Sands  Missile  Range,  New  Mexico  88002-5502 


ABSTRACT 

This  paper  describes  the  past,  present,  and  future  verification  and  validation  (V&V)  epiits  for  the  Combined 
Arms  and  Support  Task  Force  Evaluation  Model  (CASTFOREM).  CASTFOREM  is  the  Army  s  bngade  level 
hi<»h  resolution  land  combat  simulation  model.  It  has  been  used  in  numerous  studies  and  cost  and  operational 
analyses  and,  as  such,  has  undergone  an  elaborate  verification  and  validation  of  its  data  and  algorithms. 

The  generalized  verification  and  validation  processes  will  be  discussed,  specific  examples  will  be  provided,  and 
then  a  history  of  all  efforts  will  be  listed. 

The  ability  to  simulate  reality  is  a  challenge  that  may  never  be  met  but  will  always  be  a  goal  CASTFOREM 
strives  to  meet  the  challenge  by  using  a  continuous  V&V  process.  The  summation  of  many  V&V  efforts,  over 
many  years  of  use,  have  earned  CASTFOREM  a  high  degree  of  credibility  in  the  army  modeling  community. 

INTRODUCTION 

CASTFOREM  models  all  types  of  direct  fire,  crew-served  ground  weapons  systems;  helicopters;  dismounted 
infantry  (fire  teams);  artillery  (ICM,  guided  munitions,  smart  munitions,  smoke);  enpneenng  operations 
(minefields,  barriers,  and  breaching);  combat  service  support  functions  (rearm,  refuel);  communications  (including 
networks);  maneuver  with  capability  of  dynamic  route  selection;  detailed  search  and  acquisition  (multiple  sensors 
usin<J  Night  Vision  and  Electronic  Sensors  Directorate  (NVESD)  modeling);  and  realistic  battlefield  (sinoke,  dus  , 
weather.  Army  Research  Laboratory-Battlefield  Environment  Directorate's  (ARL-BED)  COMBIC  model;  digitized 
terrain).'  CASTFOREM  is  highly  flexible  both  as  to  what  it  can  model  and  as  to  the  degree  of  resolution  to  which 
an  object  or  process  is  modeled. 

Each  or<^anizational  entity  (commanders  and  units  of  resolution,  e.g.,  tanks,  infantry  fighting  vehicles  (IFV),  and 
trucks)  poss'esses  a  singular  intelligence  system  which  is  updated  by  the  acquisition  of  informatim  via  a 
communication  net  or  directly  (detecting  a  target,  encountering  an  obstacle,  receiving  fire,  etc.).  Delays  and  failures 
in  the  exchange  of  information  over  a  communication  net  will  cause  each  entity's  intelligence  system  to  perceive 
battlefield  knowledge  rather  than  perfect  knowledge.  The  latter,  however,  can  be  represented  by  simulating  perfec 
and  instantaneous  exchange  of  information  among  organizational  entities. 

In  general,  all  combat  support  and  combat  service  support  units  and  functions  which  interact  with  aniVor  directly 
affect  the  combat  activities  of  maneuver  units  are  represented  in  the  model.  The  degree  of  resolution  to  w  ic  a 
units  and  their  functions  are  modeled  is  greatest  for  maneuver  units,  less  for  combat  support  units,  and  least  tor 
combat  service  support  units.  However,  the  CASTFOREM  strucmre  facilitates  increasing  the  degree  of  resolution 
with  which  specific  vehicles,  weapons,  and  functions  are  represented  to  satisfy  user  study  objectives. 

The  CASTFOREM  scenario  preparation  process  closely  parallels  the  military  planning  process  for  a  tactical 
operation  in  terms  of  methodology.  This  is  accomplished  through  the  construction  of  knowledge  bases  (via 
decision  tables)  for  both  Red  and  Blue.  Each  knowledge  base  is  designed  for  a  specific  type  tactical  operation  (e  g., 
active  defense,  deliberate  attack,  hasty  river  crossing);  contains  doctrinal  responses  to  a  broad  spectrum  ot  tactical 
situations;  requires  user  threshold  inputs  to  trigger  each  doctrinal  response;  and  permits  dynamic  maneuver  by 
opposing  forces. 

QASTFOREM  is  comprised  of  the  following  process  modules; 

•  Command  and  Control  (C2) 

•  Communications  (COMMO) 
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•  Combat  Service  Support  (CSS) 

•  Engineer  (ENGR) 

•  Surveillance  (SEARCH) 

•  Engage 

•  Maneuver 

•  System/Environment 

The  model  contains  the  C2  (inference  engine)  logic,  which  accesses  the  knowledge  base  to  make  tactical 
decisions  which.^enerate  orders,  reports,  and  requests  for  support.  In  turn,  these  decision  table  outputs  control  the 
actions  of  units  of  resolution.  This  logic,  combined  with  explicit  representation  of  a  C2  structure  and 
communication  nets,  represents  the  C2  process  employed  by  combat  units. 

The  resolution  of  CASTFOREM  is  at  the  individual  vehicle  (e.g.,  a  tank,  an  APC,  or  a  truck)  or  individual 
soldier  and  there  are  no  artificial  limits  on  the  sizes  of  the  forces  played.  Usual  battle  times  run  from  30  minutes  to 
3  hours. 

Figure  1  portrays  the  fundamental  cycle  of  integration  over  time  for  each  CASTFOREM  unit. 
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Initially,  each  unit  will  receive  their  first  combat  orders.  They  may  direct  the  unit  ^  move,  search 
communicate,  etc.  The  unit  determines  if  it  is  feasible  to  execute  the  order  and,  if  so,  schedules  “  - 

completion.  After  time  has  been  charged,  an  assessment  is  computed  as  to  the  event  completion  (e.g.,  reached 
destination)  and  a  decision  table  may  be  executed  to  determine  the  next  appropnate  order. 

GENERALIZED  VERIFICATION  AND  VALIDATION  PROCESS 

The  definition  of  V&V  from  AR  5-1 1,  the  technique  of  V&V  process,  and  configuration  control  for  the 
CASTFOREM  are  described. 

In  accordance  with  AR  5-1 1,  we  have  the  following  definitions: 

Verification,  in  the  context  of  this  regulation  and  Army  Model  taprovement  Program 
defined  as  a  technical  review  of  a  model’s  algorithms  to  ensure  their  suitability  for  the  model  s  intended  purpo  . 
Such  a  review  must  be  designed  to  determine  if  algorithms  are  technically  sound,  consistent  with  current  approved 
analytical  techniques,  and  appropriate  to  the  model  design. 

Validation,  in  the  context  of  AR  5-1 1  and  AMIP  models,  refers  to  an  iterative  process  designed  to  determine 
whether  the  model/simulation  reflects  results  expected  in  the  real  world.  It  must  be  recognized  that,  due  o  e 
complex  nature  of  the  real  world,  no  validation  effort  can  be  expected  to  be  totally  accurate^  Nonetheless  by 
apprLching  validation  through  a  logical  sequence  of  iterative  steps  (outlined  in  paragraph  below),  an  evaluation  of  a 
moders  approximation  of  reality  can  be  obtained. 

Verification  and  validation  are  indeed  a  continuous  process  over  time.  For  CASTFOREM,  every  time  a  new 
algorithm  is  introduced  it  undergoes  a  V&V  process  to  insure  it  is  "reasonable." 

The  verification  process  is  fairly  straight  forward  as  outlined  below.  It  is  validation  that  is  difficult. 

Complicating  the  validation  effort  are  the  following: 

•  quantifying  human  factors  for  input  to  the  model 

•  modeling  futuristic  weaponry 

•  benchmark  field  tests  may  not  represent  actual  battlefield  conditions 

•  historical  data  from  actual  battles  may  not  be  representative  of  future  battles 

Absolute  validity  will  never  be  achieved  but  always  remains  as  a  goal  for  CASTFOREM. 

To  help  facilitate  algorithmic  verification,  validation,  and  consistency,  reference  (3)  was  published  by  AMSAA. 
This  compendium  of  high  resolution  attrition  algorithms  describes  CASTFOREM's  algorithms  in  detail. 

Here  are  some  thoughts  from  references  1  and  2  all  of  which  apply  to  CASTFOREM. 

"Without  validation  a  model  is  of  very  little  use.  The  concepts  of  inductive  and  deductive  reasoning  are 
introduced,  and  it  is  shown  that  it  is  impossible  to  validate  models  in  the  strictest  sense  *e  word  Modehn 
not  a  precise  science;  hence  the  criteria  used  for  testing  the  robustness  of  scientific  ^  . 

applied  to  models  at  the  present  time.  Here,  'validation'  means  substantiating  that  the  model  within  dom 
applicability  is  sufficiently  accurate  for  the  intended  applications.  The  emphasis  is  on  ° 

confidence  in  the  model  rather  than  testing  for  its  absolute  validity,  and  this  is  achieved  ^  ,  j 

support  the  validity  of  concepts,  methodology,  data,  experimental  results,  and  inference.  Model  sponsors, 
builders,  and  model  users  must  be  prepared  to  accept  compromise  solutions." 

"The  ease  or  difficulty  of  the  validation  process  depends  on  the  complexity  of  the  system  being  "jodeled  and 
on  whether  a  version  of  the  system  currently  exists.  For  example,  a  model  of  a  neighborhood  bank  would  be 
relatPvely  easy  to  validate  since  it  could  be  closely  observed.  On  the  other  hand,  a  model  of  the  effectiveness  o 
naval  weapoii  system  in  the  year  2025  would  be  virtually  impossible  to  validate  completely,,  since  the  location  of 
the  battle  and  the  nature  of  the  enemy  weapons  would  be  unknown."' 
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"A  simulation  model  of  a  complex  system  can  only  be  an  approximation  to  the  actual  system,  regardless  of 
how  much  effort  is  put  into  developing  the  model.  There  is  no  such  thing  as  an  absolutely  valid  model. 

"A  simulation  model  should  be  validated  relative  to  those  measures  of  performance  that  will  actually  be  used 
for  decision  making. 

"...  model  development  and  validation  should  be  done  hand-in-hand  throughout  the  entire  simulation 
study."" 

VERIFICATION  PROCESS 

Data  verification  techniques  are: 

•  Identification  of  data 

•  Traceability  of  data  to  approved  sources  (e.g.,  Ballistics  Research  Laboratory  (BRL),  Army  Materiel  Systems 
Analysis  Activity  (AMSAA),  etc.) 

•  Analysis  of  the  use  of  the  data 

Algorithm  verification  techniques  consist  of: 

•  Running  parametric  sensitivities  on  the  algorithm  in  a  stand-alone  environment  and  as  an  integral  part  of 
CASTFOREM  and  then  analyzing  the  outputs  vis-a-vis  the  inputs.  Output  is  analyzed  from  a  numerical, 
statistical,  and  behavioral  perspective  to  determine  if  the  first,  and  higher  order  effects  of  the  algorithm  have  surfaced 
as  intended  by  the  modeler. 

•  Structured  walk-through  of  the  stand-alone  algorithm  and  its  model  interfaces.  This  technique  enables  all 
personnel  involved  to  come  to  the  same  plane  of  understanding  regarding  expected  model  outputs.  It  provides  the 
opportunity  for  the  designer,  coder,  and  reviewer  to  make  a  detailed  review  of  the  coded  algorithms'  structures  to 
ensure  that  the  algorithm  functions  as  intended  by  the  modeler  and  that  the  necessary  dynamic  data  interactions  take 
place  properly. 

•  Day-to-day  checkout  of  code  by  the  programmers. 

•  Day-to-day  checkout  of  scenarios  by  the  analysts, 

•  Briefing  new  algorithms  at  annual  CASTFOREM  users'  group  meeting. 

VALIDATION  PROCESS 


Data  validation  is  accomplished  by  ensuring  that  the  data  be  representative  of  some  empirical  standard  to  attain 
consistency  and  reasonableness.  AMSAA  is  key  to  this. 

Algorithm  validation  is  accomplished  by,  once  again,  running  parametric  sensitivities  on  the  algorithm  in  a 
stand-alone  environment  and  as  an  integral  part  of  CASTFOREM.  Then,  one  or  several  of  the  following  techniques 
are  applied  to  ensure  "reasonableness"  and  consistency  of  the  model  output: 

•  Field  test  comparisons 

•  Comparison  of  results  to  historic  data  or  other  model  results  (benchmarking) 

•  Independent  review,  either  by  designated  committees  or  by  functional  area  experts,  to  determine  if  model 
resfcilts  are  "reasonable" 

•  Study  advisory  groups  (SAGs) 

•  Peer  review  groups  within  the  study  process 
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mNFTGITRATION  CONTROL. 


Frequently,  new  algorithms  or  updates  to  old  algorithms  are  presented  for 
CASTFOREM.  This  section  describes  the  process  by  which,  changes  are  brought  into  CASTFO 


Figure  2  portrays  CASTFOREM  configuration  control.  In  general,  modifications  are  desired  by  a  user,  either 
local  ™odific..io.s  aae  coded  and  checked  out  in  a  teat  envho— .  Once  .hey  ^  ^ 

integration  into  CASTFOREM,  TRADOC  Analysis  Center- White  Sands  Missile  Range  (TRAC  S  )  ^ 

V&V  effort. 
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Figure  3  portrays  TRAC-WSMR’s  V&V  review  process. 


Figure  3.  TRAC-WSMR's  V&V  Review  Process 


First  the  ’’current"  version  of  the  model  is  replicated  and  post-processed  with  a  benchmark  scenario. 

Second,  the  new  code  is  compiled  into  the  model  providing  a  "test"  version.  The  test  version  is  then  replicated 
and  post-processed. 

Third,  the  results  of  each  set  of  runs  is  compared.  If  the  comparison  is  favorable,  the  new  code  is  moved  to  the 
permanent  modification  file  and  documented. 

The  algorithm  and  its  supporting  data  undergo  all  applicable/possible  V&V  efforts  described  above.  Once  it  is 
agreed  that  the  new  algorithm/data  produce  the  correct  effect,  the  code  is  integrated  into  CASTFOREM. 

Algorithm  integration  into  CASTFOREM  consists  of: 

•  Documenting  the  routines  added/modified 

•  Saving  the  old  code  as  backup 

•  Moving  the  new  code  to  the  reference  version 
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VERIFICATION  AND  VALIDATION  EXAMPLES 


FyAMPT.F.  1.  TA^TFOREM  COMPARISON  TO  CARMONF.TTE  MODEL 

CARMONETTE  was  the  previous  high  resolution  model  used  widely  by  Ae  Army  community.  It  had  gamed 
a  high  degree  of  credibility  over  the  years  and  was  considered  the  benchmark  simulation. 

Benchmarking  CASTFOREM  against  CARMONETTE  was  an  expedient  way  of  "inheriting"  some  of 
CARMONETTE’s  credibility. 


The  Armor  Investment  Strategy  (AIS)  Study  was  chosen  as  the  first  benchmark  This  study 
run  in  CARMONETTE  with  three  main  alternatives:  Blue  tank  lethality,  Blue  tank  accuracy,  and  ITOW.  It  was 
then  rerun  using  CASTFOREM.  The  findings  of  the  study  were  the  same. 


More  importantly,  not  only  did  CASTFOREM  provide  end  game  statisfics  comparable  to  C^^ONETTE, 
but  Blue  and  Red  losses  over  time  were  also  comparable.  See  figure  4.  This  showed  that  the  battle  evolved  over 
time  in  the  same  way  in  both  models.  This  was  a  major  milestone  for  CASTFOREM. 


EXAMPLE?  FA  ADS  MODEL-TEST-MODEL 

This  was  a  large  effort  to  compare  the  pedestal  mounted  Stinger  (PMS),  MANPADS  Stinger  teams,  and 
LOS-F-H  to  the  field.  Day,  night,  MOPP,  NOMOPP,  CUED,  and  AUTONOMOUS  cases  were  all  run.  The 
ranges  of  detection,  engagements,  and  intercepts  were  compared  to  the  field. 

In  general  intercept  and  engagement  ranges  compared  favorably  with  the  field.  Detection  ranges  did  not.  The 
collection  of  detection  ranges  from  the  field  was  difficult.  Also,  the  NVL  detection  model  produced  detections  at 
much  shorter  ranges  than  in  the  field.  Some  sample  results  for  PMS  and  MANPADS  are  provided  in  figure  5. 
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Figure  5.  Model-Test-Model  Effort 

This  effort  won  the  Wilbur  B.  Payne  award. 

EXAMPLE  3.  SIMNET-D  M1A2  SYNTHETIC  ENVIRONMENT  EXPERIMENT 
This  was  an  experiment  using  SIMNET-D  and  man-in-the-loop  M1A2  simulators. 

Several  modeling  insights  were  gained.  CASTFORJEM  implemented  implicit  and  explicit  target  cueing,  a 
degradation  in  the  usage  of  CITV  (from  100  percent  previously),  reorienting  the  hull  toward  the  enemy  to  increase 
survivability,  and  new  tank  gunner  disengage  logic.  This  logic  would  disengage  a  tank  gunner  only  when  the  target 
stops  moving  and  firing. 

EXAMPLE  4.  LONGBOW  lOTE  LINKAGE  TO  COEA  0995) 

This  is  one  of  the  most  recent  efforts.  It  compared  CASTFOREM  results  to  a  similar  scenario  flown  in  the 
lOTE. 

Similarities  were  shown  in  the  percentage  increase  of  loss  exchange  ratios,  number  of  kills  per  system, 
survivability  between  basecase  and  longbow,  and  helicopter  tactical  sequences. 

Timelines  did  not  compare  favorably  and  this  is  still  an  open  question. 
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VERIFICATION  AND  VALIDATION  EFFORTS 


VERIFICATION 


Reference  2  provides  several  techniques  for  verification  which  have  been  used  extensively  in  CASTFOREM. 
(There  are  many  good  references  on  V&V.  I  choose  1  and  2  as  representative.) 

•  Technique  1:  "In  developing  a  simulation  model  write  and  debug  the  computer  program  in  modules  or 
subprograms." 

This  was  a  coding  convention  imposed  on  the  original  coding  team  of  CASTFOREM  and  remains  in  force 
today.  There  are  nine  major  modules  of  code:  surveillance,  maneuver,  combat  service  support,  engineer,  engage, 
communications,  command,  control,  and  the  system.  Each  of  these,  in  turn,  have  numerous  submodules. 


•  Technique  2:  "It  is  advisable  when  developing  large  simulation  models  to  have  more  than  one  person  read 
the  computer  program.'* 

Durin-  past  and  current  development  of  CASTFOREM,  any  coding  done  by  a  team  member  was  always 
reviewed  by  the  chief  programmer,  at  a  minimum.  Over  the  past  several  years,  due  to  employinent  turn  over, 
various  modules  of  code  have  been  passed  to  new  team  members  who,  in  turn,  begin  by  flow  charting  the  module. 
This  has  provided  an  excellent  "second  look". 

•  Technique  3:  "The  model  should  be  mn  under  simplifying  assumptions  for  which  its  true  characteristics  are 
known." 

Whenever  a  new  algorithm  is  being  tested  in  CASTFOREM,  it  is  analyzed  using  a  one-on-one  or  ^w-on-few 
scenario  that  is  tailored  to  the  new  coding.  This  allows  the  programmer  a  chance  to  compare  the  result  to  a  hand 
calculated  result.  Then  it  is  tested  out  further  in  a  many-on-many  scenario.  Finally,  it  is  tested  on  several  large 
"high  resolution"  scenarios. 

•  Technique  4:  "With  some  types  of  simulation  models,  it  may  be  helpful  to  observe  an  animation  of  the 
simulation  output." 

This  is  a  very  important  part  of  verification  for  CASTFOREM.  CASTFOREM  has  an  elaborate  playback 
capability.  It  allows  a  user/coder  to  visually  look  at  a  playback  of  a  battle  to  determine  its  overall  integrity. 


VALIDATION 

Reference  2  provides  a  three  step  approach  for  developing  a  valid  and  credible  simulation  model.  The  three  steps 
include: 

1)  Developing  a  high  face  validity  via  expert  review,  peal  backs,  briefings,  and  comparisons  to  already 
accepted  models. 

2)  Empirical  testing  of  model  assumption  using  sensitivity  analysis. 


3)  Comparing  model  output  to  field  test  data. 

Table  I  is  a  chronological  listing  of  all  major  algorithmic  V&V  efforts  for  CASTFOREM. 

Table  II  is  a  listing  of  all  major  studies  CASTFOREM  has  successfully  completed,  some  of  the  major  combat 
simulations  that  CASTFOREM  has  been  compared  to  in  some  detail. 


Table  III  outlines  table  IV  lists  all  field  tests  and  battles  that  CASTFOREM  has  been  compared  against. 


As  one  can  see  CASTFOREM  has,  over  the  years,  undergone  an  enormous  amount  of  V&V  which  continues 
on  a  daily  basis.  As  a  consequence,  CASTFOREM  has  earned  a  high  degree  of  credibility  in  the  Army  modeling 


community. 
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Table  1.  CASTFOREM  Algorithm  V&V  Efforts  (Continued) _ 
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Cumulative  Damage  AMSAA  Yes  A2R2  -  Still  under  review 

Meuristic  Otliers 

(2  Kills  K-Kill) _ J _ _ 


85 

FADEWS  (S) 

85 

STINGER  Proficiency  (S) 

86 

Annor  Investment  Strategy  (S) 

87 

M1A2  (C) 

87 

FAADS  (C) 

87 

Infantry  Anti  Armor  Weapon  System  (lAAWS)  (C) 

88 

FAADS  FDTE  (M-T-M) 

88/89 

LHX  (C) 

88/89 

Longbow  (C) 

89/90 

WAM  (C) 

89/90 

Armored  Systems  Modernization  (ASM)  (C) 

90 

AMPAW/ARDEC  (R) 

91 

Auto  Tracker  (S) 

91 

Legal  Mix  VII  (C) 

91 

Army  Mortar  Master  Plan  (R) 

92 

STINGRAY  (C) 

92 

FAADS  (C) 

92 

LOSAT  Countermeasures  (R) 

92 

Lightweight  Laser  Desienator/Rangefinder  (C) 

92/93 

Battlefield  Combat  ID  System  (BCIS)  (C) 

92/93 

AFAS  (CySADARM  (C) 

93 

Counter  Battery  vs  NLOS  (R) 

93 

JAVELIN  lOTE  (S) 

93 

Division  Air  Defense  (S) 

93 

Second  Generation  FLIR  (C) 

93 

Guardian  Task  Force  (R) 

93 

M1A2  (C) 

92/93 

M 1 A2  Synthetic  Environment  Experiment 

93 

CR-UAV  (C)  1 

93 

M2A3  (C)  1 

93 

155  SAD  ARM  (R) 

94 

NBC  Recon  System  lOTE  (R) 

94 

MLRS  Extended  Range  Guided  Round  (R) 

94 

2K  Study  for  EELS  (S) 

94 

ARPA  Jamming  (R) 

94 

TUG-V  (R) 

94/95 

AWS-H  Quick  Reaction  Study  (S) 

94/95 

Anti  Armor  ATD  (R) 

94/95 

M 1  Breacher  (C) 

94 

Land  Warrior  (C) 

94 

JAVELIN  (R) 

94 

M1A2  lOTE  (M-T-M)  (S) 

94 

Anti-Helicopter  Mines  (R) 

94 

Engagement  Situational  Awareness.® 
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_ iapie  ii*  rasi  - - - - - 

Ypar  Study  (SVCOEA  (C)/Reimbursable  (R) 

94  V22  Navy  COEA  CR')  _ _ _ — 

94/95  OffRoute  Smart  Mine  Clearing  (R) _ _ _ _ 

95  Longbow  (C)  -  - - - - - - 

95  Degraded  States  (R) _ _ _ _ _ _ _ 

95  Longbow  lOTE  (M-T-M)  (S)  _ _ _ _ — 

95  Longbow  Countermeasures  (C)  - - - 

95/95  Anti  Annor  Resource  Requirements  (S) _ _ _  — - - - - 

95/95  Contermine  Tactics  (R) _ _ _ _ _ _ 

95  BCIS  DDL  (R) _ _ _ _ _ _ _ _ _ 

95/95 _ WAM  lOTE  (S) _ _ _ _ — - - - - 

95  Combined  Arms  Command  &  Control  (K) _ _ _ _ — - - - - 

95 _  Task  Force  XXI  (S) _ _ _ _ _ _ _ 

95 _ Legal  Mix  VIII  (S) _ _ _ _ _ _ _ _ _ 

95  _ Task  Extended  Range  Munition  (R) _ _ _ _ — - - 

96 _ AAAV  (R) _ _ _ _ _ - — - 

95  _ International  Combat  ID  (S) _ _ _ _ _ _ _ — 1 

Table  UI.  CASTFOREM  Comparisons  to  Other  Simulations 


Xiiuie  XXX. 

CASTFOREM 
Compared  to 

^  X  yyjVJLilTX  ^  - - — _ ^ - - - - - 

Important  Insishts  Gained 

1)  CARMONETTE 

-CASTFOREM  would  have  provided  same  results  if  it  had  been  used  in 
Armor  Investment  Strategy  (AJS)  Study 
-End  game  statistics  were  comparable 

-Statistics  over  battle  time  were  comparable _ _ _ 

2)  Janus 

-Highlighted  the  differences  due  to: 

False  Targets 

Overkill 

Acquisition  Level  Required  for  Trigger  Pull 

Use  of  Vegetation 

Bradley  Crossover  Ranee:  Missile-to-Gun 

3)  Ground  Wars 

-Helped  determine  draw  methodology  for  acquisition  using  N  VL  P-lnfmity - 

4)  SIMNET-D 

-Motivated  change  in  tank  gunner  disengage  logic:  "Shoot  until  he  stops 
-Commander  uses  CITV  less  than  100  percent  of  time - - - 

5)  ModSAF 

-CASTFOREM  has  increased  fidelity  of  some  missile  flyouts 

-This  is  ongoing  presently _ _ _ _ _ _ _ 
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Table  IV.  CASTFOREM  Field  Test/Battle  Comparisons 


Armor  Combat  Operations  Model  Support 
(ARCOMS)  Field  Test  Experiment  Phase 

II  at  Ft  Hood  (1986) 

-Shots  vs  range  compared  well 
-Engage  times  same  for  defender  but  longer  in 
CASTFOREM  for  attacker 

Soviet  Artillery  Effects  (SAE)  (1987) 

-Attrition  trends  between  personnel  and  armored 
vehicles  agreed  with  test  but  not  with  truck 

Smoke  Week  5B  Clear  Air  Trials  vs  NVL 
Predictions  for  Probability  of  Detection 
(1988) 

-Good  FITforFLIR 
-Poor  FIT  for  OPTICS 

Forward  Area  Air  Defense  Systems  Initial 
Operation  Test  and  Evaluation  (lOTE) 
(1989) 

-LOS-F-H  and  PMS  average  shot  range  from  model 
and  test  were  within  1  sigma  of  each  other 
-Explicit  field  test  movement  scripted  into  model 
via  external  events  for  first  time 

AMSAA  Multiple  Target  Acquisition  Study 
(1990) 

-Suggested  that  each  observer  has  a  "detect 
threshold" 

-As  target  size  increases,  probability  of  detect 
increases 

Study  of  Artillery  Effects  (SAE)  Phase  IIA 
Technical  Shoot  (1990) 

-Carelton  damage  function  over  estimates  damage 
vs  truck  and  underestimates  vs  armor 

M1A2  EUTE  (1992-93) 

M1A2  lOTE  (1993-94) 

-Range-time  scatter  plots  of  shots  correlated  well 
-lOTEgunners  fired  conventional  rounds  at  ranges  > 
3000m 

JAVELIN  lOTE  (1993) 

-Limited  amount  of  pre-test  work  done  only 

NBCRS  lOTE  (1994) 

-Used  field  test  to  model  tactics  of  encountering  a 
contaminated  area 

Apache  Longbow  lOTE  (1995) 

-Cross  walk  only  for  COEA  linkage 
-Helicopter  sequence  of  tactics  compared  well 
-Timelines  were  longer  in  field  (still  an  open  issue) 

WAM  lOTE  (1995-96) 

-Code  peelback  by  OEC 
-A^ISAA  review  of  data 

Task  Force  XXI 

-Baseline  case  in  progress 
-Digitization  case  near  future 

Distributed  Interactive  Simulation  Search  and 
Target  Acquisition  fidelity  (DISTAF) 

-Analysis  of  data  in  progress  to  support  play  of  MIS 
identification 

73  EASTING-SWA 

-P- infinity  curve  is  upper  bound  to  detect  ranges 
-Disabled  variable  contrasts  until  more  data 

Norfolk-SWA 

-Used  for  Combat  ID  sensitivities  and  accreditation 
for  BCIS  COEA 
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EMPIRICAL  PROCESSES  AND  LEAST-SQUARES  ESTIMATION 


Joseph  C.  Collins 
U.  S.  Army  Research  Laboratory 
Aberdeen  Proving  Ground,  MD  21005 


ABSTRACT 

The  theory  of  continuous  regression  for  Gaussian  stochastic  processes  gives  rise  to  a  method  for 
computing  parametric  and  nonpar ametric  estimators  of  the  unknown  probability  density  function  /  of  a 
random  sample.  The  method  generalizes  to  the  case  in  which  the  observation  density  is  1C  f  ^  where  1C  is 
an  arbitrary  known  operator.  Parametric  estimators  enjoy  the  usual  optimal  properties.  Nonparametric 
estimators  are  obtained  by  constrained  optimization  of  a  quadratic  functional  in  /  and  are  hence  easily 
computable  with  existing  software.  Theoretical  properties  of  the  estimators,  examples  with  real  data, 
and  a  simulation  study  are  included. 


INTRODUCTION 


BACKGROUND 

Let  the  random  variables  ti, , .  .,tn,  be  independent,  identically  distributed  (i.i.d.)  with  cumulative 
distribution  function  (c.d.f.)  Fe.  The  empirical  c.d.f.,  Fn{t)  =  n-^  /(<,•  <  i),  converges  to  a  Gaussian 

stochastic  process  in  the  sense  that 

\/n{Fn-Fe)-^BoFe  as  n  — oo,  (1) 

where  R  is  a  Brownian  bridge,  which  is  a  zero-mean  Gaussian  stochastic  process  with  covariance  function 
E[B{s)B{t)]  =  $  At  —  st.  To  estimate  0,  we  model  Fn  as 

Fn{t)  =  Feit)-\-n-^^^Ae{i),  (2) 

where  Ae  is  a  zero-mean  Gaussian  process  with  covariance  E[AB{s)A0{t)]  =  Feis  At)  -  F0{$)Fe{t). 

To  provide  heuristic  motivation  for  the  model,  solve  (1)  for  Fn^  We  may  estimate  the  parameter  of 
model  (2)  by  the  methodology  of  continuous  regression  for  Gaussian  stochastic  processes,  which  we  now 
review. 

CONTINUOUS  REGRESSION 

The  following  development  is  due  to  Parzen^’^.s  X{t)  =  M{t)  +  A{t)  be  a  Gaussian  stochastic 
process  on  a  domain  ICR  with  unknown  mean  E[X{t)]  =  M(f),  and  known  covariance  E[A{s)A{t)]  = 
K{s,t).  We  wish  to  estimate  M  by  the  principle  of  maximum  likelihood.  So  we  need  to  identify  an 
appropriate  likelihood  ratio,  the  definition  of  which  involves  some  preliminary  constructions. 

First  of  all,  L{X{t))  =  {X;r=i  a.  A'(ti)  :  n  E  N,  €  J,  a.-  €  R}  is  a  vector  space  with  inner  product 
{u,v)  =  E[uv]  and  (•,  •)-completion  L2{X{t)),  which  is  a  Hilbert  space. 

Then,  Hk  is  the  reproducing  kernel  Hilbert  space  (RKHS)  with  reproducing  kernel  K  and  inner 
product  (*,  •)^.  Denote  K(-,t)  by  Kt(-).  The  fundamental  reference  is  Aronszajn^. 

Next,  <j)K  :  Hk  L2{X{t))  is  a  function  characterized  by  (l>{Kt)  =  A'(f). 

Finally,  Y(t)  zero-mean  is  a  Gaussian  process  with  E[Y{s)Y{t)]  =  K{s,t),  and  P{K)  and  P(K,  M) 
are  the  probability  measures  induced  by  Y (<)  and  X(t)  respectively  on  a  suitable  space  of  sample  paths. 

Approved  for  public  release;  distribution  unlimited. 
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With  all  this  in  place,  we  can  write  down  the  likelihood  ratio.  When  M  E  Hkj  the  measures  P{K) 
and  P{Kj  M)  are  equivalent,  and  the  Radon-Nikodym  derivative  of  P{K^  M)  with  respect  to  P{K)  is 

^  -  \\\M\\]c\  . 

The  maximum  likelihood  estimator  (regression  estimator)  of  E[X]  is  that  value  of  M  which  maximizes 
the  likelihood  ratio  L.  It  is  illuminating  to  note  that  in  the  case  of  a  finite  domain  /,  one  obtains  the 
usual  least-squares  (LS)  estimator  for  M,  and  in  the  linear  model  when  E[X]  =  M  =  Z/?,  the  estimator 
is  the  familiar  weighted  linear  regression  estimator  ZT X , 

Sequences  of  processes  Xn(t)  =  M(t)  +  have  “scaling”  properties,  which  imply  that 

=  exp  [MK{Xn,M)-^M\\l]  = 

The  form  of  the  likelihood  ratio  does  not  depend  on  sample  size.  However,  our  processes  have  unknown 
covariance.  So  we  use  a  modified  version  of  this  estimation  scheme. 

THE  PARAMETRIC  LEAST-SQUARES  ESTIMATION  SCHEME 


In  our  models,  the  mean  and  covariance  share  a  common  parameter  0  G  ©•  The  basic  model  is 
X{i)  =  M^(t)  +  A0(t),  with  mean  E[X(t)]  =  Me{i)  and  covariance  =  Ke{s,t).  We  obtain 

a  sequence  of  least-squares  estimators  for  $.  Given  ^Oj  the  sequence  (^i,  ^2?  ^3j  •  •  •)  is  defined  by 

L  dP{Ke.,{))  J 


dF(Ke„0)  I  dF(K,„0)  ' 

In  light  of  the  scaling  property,  we  can  apply  this  concept  to  any  empirical  stochastic  process  Xn ,  where 

y/n  (Xn  —  Me)  as  n  oo, 

through  use  of  the  model 

X„(t)  =  M,(t)  +  n-^/^Ae(t). 

For  a  given  observation  (data  process)  X„  and  initial  parameter  guess  6„^o,  we  define  the  sequence  of 
estimators  ^n.2,  ^n.s,  •  •  •)  by 

dP(K,.,„0)  1  dP(K,.,„0)  "  °1  ■ 

We  use  the  likelihood  ratio  for  known  covariance  in  a  recursive  fashion  to  estimate  the  parameter.  Note 
that  this  scheme  is  not  limited  to  estimation  based  on  the  empirical  c.d.f.  Fn,  but  applies  to  any  suitable 
(asymptotically  Gaussian)  empirical  stochastic  process. 

DISTRIBUTION  OF  THE  PARAMETRIC  LEAST-SQUARES  ESTIMATOR 


We  present  the  following  result  concerning  the  consistency  and  asymptotic  distribution  of  the  general 
parametric  LS  estimator  without  proof.  Under  suitable  regularity  conditions,  the  first-stage  (i  =  1) 
estimator  behaves  according  to 


•  (0n,i  -  r) N  ^0,  ^ 
and  for  all  ^  >  1,  the  iterated  estimators  according  to 


AT-  I  0  H‘^7T-^r|l7 

V’  ll^rll^ 


\/o  •  (^n  «  —  t)  X  fo, - ^ - 


as  n  -+  00, 


as  n  — >  oo, 
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where 


n  is  the  sample  size, 

{Mb  :  0  €  0}  is  a  parametric  family  of  mean  value  functions, 

Msit)  denotes 

r  is  the  true  parameter  value, 

7  =  6nfi  is  an  initial  parameter  guess, 

is  the  estimator,  assuming  covariance  parameter  and 

Syr  represents  the  square  root  of  the  map  Ky{t,  •)  i— »■  Kr{t,  •)• 

The  parametric  estimator  is  asymptotically  unbiased  and  has  a  normal  distribution.  Asymptotic  distri¬ 
butions  are  the  same  for  all  iterates  i  >  2. 

Information  and  Optimality.  Fisher’s  information  measure  (Rao®)  is 


m 


—  Eb  (n<j>{Xn,MB)-n(^MB,MB'^^ 


n^Varo  ^(A„,  M6)|  =  n||M«||fl. 


For  i  >  1,  the  variance  of  the  estimator  achieves  the  Cramer- Rao  lower  bound  as  n  — *•  oo.  Therefore,  the 
iterated  LS  estimator  with  i  =  2  is  “efficient,”  or  asymptotically  optimal  and  therefore  equivalent  to  the 
maximum  likelihood  estimator  (MLE). 


DENSITY  ESTIMATION 

Consider  the  special  case  of  density  estimation  based  on  =  Fn-  Let  ti, . . .  ,tn  be  i.i.d.  with  c.d.f. 
Frit)  and  probability  density  function  (p.d.f.)  frit).  The  negative  log  LS  functional  assumes  the  form 

The  LS  estimator  sequence  is  (7  =  t„,0)  ^n,!.  ^^1,21  •  •  •)>  where  (■>*»,»)  =  iiif  {•^n,Tn,i_i(^)  •  ^  €  0}  ■ 

It  can  be  shown  that  for  any  n,  if  r„,f  converges  as  i  00,  then  r„,oo  minimizes 


J^ie)  =  -J logfiit)  dFnit), 

which  is  the  negative  log  likelihood.  Thus,  r„^oo  is  the  traditional  MLE. 

Example:  Linear  Density.  Let  <1, . . .  ,t„  be  i.i.d.  on  [0, 1]  with  density  frit)  =  r  -I-  2(1  -  r)t,  where 
r  €  [0, 2].  With  7  fixed,  the  LS  functional  is 


^n,yi^)  —  j 

Jo 


^  9 +  2il-  9)t 


dFnit)  +  2  ^ 


^  i$  +  2(1  -  9)ty 


7-f2(l-7)r‘""""  '  2  Jo  7  +  2(l-7)t 
For  7  =  1,  we  get  r„  =  4 -  ^ X)”=i ti  =  i-  6f.  If  7  #  1.  the  LS  estimator  is 


dt. 


TV,  =7  + 


n(l-7-ilog(f-l)) 


For  comparison,  the  MLE  is  the  solution  9  of 


^  1  -  2ti 

^^0  +  2il-9)ti 


=  0, 
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which  is  not  obtainable  in  closed  form. 


Simulation:  Linear  Density.  To  illustrate  these  calculations,  we  conduct  a  small  simulation  using  the 
density  function  of  the  previous  section,  fr(t)  =  r  +  2(1  -  r)t.  The  true  parameter  value  is  r  =  0.333. 
Three  sample  sizes  are  used:  n  =  10,  n  =  100,  and  n  =  1000.  In  all  cases,  the  initial  guess  for  LS 
estimation  is  7  =  =  1.0.  The  simulation  is  based  on  N  =  1000  runs.  The  mean  squared  errors 

(MSE)  presented  in  the  body  of  Table  1  are  calculated  as 

MSE  =  (t*  ““  ,  where  is  the  LS  estimate  for  the  iterate  of  the 

simulation  run  using  a  sample  size  of  n.  Likewise  for  the  maximum  likelihood  estimator  9.  Even  the 
first-stage  estimator  6n,i  has  acceptable  properties,  compared  to  the  MLE  6. 

As  stated,  the  LS  estimation  scheme  can  be  applied  to  other  stochastic  processes. 

POISSON  PROCESS  INTENSITY  ESTIMATION 

Here,  we  consider  a  Poisson  process.  Let  N{t)  =  /(Ti  <  t)  be  the  counting  process  for  a 
Poisson  process  with  intensity  g  and  mean  measure  G  ^  f  g.  The  model  is 

N{t)^G{t)  +  A{t), 

where  A  is  a  Gaussian  process  with  mean  E[A{t)]  =  0  and  covariance  jE[A(s)A(t)]  =  G{s  A  t).  The  LS 
functional  is 

The  functional  has  the  same  form  as  in  density  estimation  even  though  the  covariance  structures  are 
different.  The  convergent  estimator  is  the  MLE  in  this  case  also. 

QUANTILE  FUNCTION 

The  quantile  function  Q  is  the  inverse  of  the  c.d.f.  F,  Likewise,  Qn  is  the  inverse  of  Fn,  Our  model 
for  the  empirical  quantile  process  is 


Qn{u)  =  Q{u)  -h  n 

where  A  is  a  Gaussian  process  with  mean  ^[A]  =  0  and  covariance  E[A{u)A{v)]  =  Q\u)Q\v){u/\v  —  uv). 
The  LS  functional  for  this  process  is 


The  asymptotic  covariance  of  the  data  process,  in  this  case  the  quantile  function,  determines  the  form  of 
the  LS  functional. 

Location  and  Scale  Estimation  for  the  Quantile  Function.  For  location  and  scale  estimation,  we  have 
a  fixed  quantile  function  Qo.  The  parameter  is  6  =  (a,  6),  and  candidate  functions  have  the  form 

Q{u\a,h)  -  a  +  hQo{u). 


For  any  choice  of  7,  the  LS  estimator  is 


ra„  1 

(1.1) 

(i.Qo)  ■ 

-1 

■  ■ 

.  . 

{Qo^Qo) 

{QoiQn) 

where  the  RKHS  inner  product  is  (ar,y)  =  This  estimator  is  known  to  be  best  linear 

unbiased  (Bennett®,  Parzen^).  Note  that  the  LS  estimator  is  independent  of  the  covariance  parameter. 
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NONPARAMETRIC  ESTIMATION 


We  return  to  density  estimation.  Relaxing  restrictions  on  the  candidate  density  functions  gives  a  non- 
parametric  estimation  problem.  Let  Xi , . . . ,  X„  be  i.i.d.  F 5.  Let  h  be  a  p.d.f.  The  natural  nonparametric 
version  of  the  density  estimation  problem  is: 


minimize  Jn,h{f)  =  ~  J  ^  2  J  ^  subject  to  /  €  L2,  /  >  0,  and  J  f  —  1. 

This  problem  has  no  solution.  We  can  define  a  sequence  of  f’s  with  J  unbounded  below.  These  ^/’s 
approach  “spikes”  at  the  data  values.  The  solution  tends  to  the  empirical  point  measure,  /„  =  = 


PENALIZED  DENSITY  ESTIMATION 


We  change  the  problem  by  adding  a  penalty  term  to  the  objective  functional.  This  term  grows  larger 
as  /  gets  close  to  . 

Let  Xi, . .  .,X„  be  i.i.d.  F^,  with  U  =  Let  A  be  a  p.d.f.  Let  P  be  a  linear  differential  operator 
of  order  p  ^  1  with  no  constant  term,  and  let  ck  >  0.  Then  the  problem 

•  •  •  7  (f\  [fdF  /^  +  - 

minimize  Jn,h{f)  =  “  /  2  J  h  ^  2  j  h 


subject  to  /  S  Tip,  /  >  0,  and  J  f  —  1 


has  a  unique  solution  (by  a  theorem  of  Thompson  and  Tapia®). 

The  spaces  Tip  =  {/  :  €  I-2}  are  Sobolev  spaces.  The  “correct”  weight  is  h  =  fg.  Penalized 

estimation  has  been  investigated  by  Good  and  Gaskins®,  Silverman^®,  Thompson  and  Tapia^S  Wahba^^^ 
Cox^®,  O’Sullivan^"*,  and  many  others. 

Continuous  Representation.  The  LS  functional  can  be  expressed  in  terms  of  weighted  I2  inner  prod¬ 
ucts,  so  we  can  write  out  the  differential  equation  that  characterizes  the  estimator.  Inner  products  are 
{x,y)  =  J  xy  and  {x,y),j,  =  /  xyw.  Identifying  w=  1/h,  the  LS  functional  is 


=  -  if,  wf„)  +  ^  if,  {w  +  aV*wV)f) . 


The  estimator  satisfies  the  differential  equation  (in  -1-  oiD*v/D')f  —  m/n  subject  to  /  >  0  and  J"  /  —  1. 
Discrete  Representation  and  Calculation.  The  discrete  version  of  the  LS  functional  is 


m  =  -/JF/  -t-  \fRf  +  %fD*RDf. 

Equivalently,  with  {x,y)ji  =  x^Ry,  we  can  take  J(/)  =  |1/—  /n||jt  +  O'll-O/Hfl-  Then,  the  LS  estimation 
problem 


minimize  J{f)  subject  to  /  >  0  and  J  /  —  1 

is  the  standard  quadratic  programming  problem.  The  corresponding  matrix  equation  for  /  is 

{R  +  aD^RD)f  =  RU 

Quadratic  programming  problems  are  easy  to  solve,  in  the  sense  that  high-quality  software  is  widely 
available.  All  calculations  in  this  report  were  performed  using  Visual  Numerics,  Inc.,  IMSL  routines. 
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Figures  1  through  4  show  the  effects  of  varying  the  parameters  that  characterize  the  LS  estimator. 
The  data  set  used  is  the  “Buffalo  Snowfall”  data,  which  is  sometimes  exhibited  as  a  sample  from  a 
trimodal  distribution.  Figure  1  shows  the  effect  of  discretization  grid  size.  The  sizes  depicted  are  10,  20, 
50,  and  100.  Figure  2  shows  the  effect  of  changing  the  smoothing  parameter  a.  Figure  3  shows  the  effect 
of  iterating  the  estimator.  There  are  five  curves  on  this  graph.  For  all  practical  purposes,  convergence  is 
complete  by  the  third  iteration.  Various  derivative  penalty  functionals  were  used  in  Figure  4. 

CHARACTERIZATION  OF  THE  LEAST-SQUARES  ESTIMATOR 

By  the  superposition  principle  for  linear  differential  equations,  the  LS  density  estimator  can  be 
written  as  a  sum.  The  unconstrained  solution  in  the  continuous  representation  is 

where  Za^x  satisfies  the  differential  equation  (w  +  aT>*wI>)Zct,x  =  A  density  estimator  of  this  form 
is  referred  to  as  a  generalized  kernel  density  estimator. 

In  fact,  for  I>x  =  x'  and  h  =  uniform,  the  LS  differential  equation  is  Za,x  —  and 

the  solution  on  [— T,  T]  becomes,  as  T  — ►  oo,  Zct,x(^)  =  ^xp  {—\X  —  t|/\/a) .  Obviously,  this  is  the 
standard  kernel  density  estimator,  with  a  bilateral  exponential  kernel. 

Consistency.  Under  some  technical  assumptions,  which  will  not  be  enumerated  here,  a  result  of  Bosq 
and  Lecoutre^^  on  generalized  kernel  density  estimators  gives  strong  uniform  consistency.  Consistency 
requires  a  sequence  of  smoothing  parameters  that  go  to  zero,  but  not  too  quickly. 

Let  w  be  fixed.  T>  is  of  order  p.  Let  /„  be  the  minimizer  of 

Jn,an,wif)  =  —  (/j  fn)yj  +  f  ||/||^  + 

If 

an 0  and  (n/logn)^^an  ^  oo  as  n oo, 

then 

E  l^sup  |/n(t)  “  fo(i)\  0  as  n  -)■  oo 

and 

sup  |/n(t)  -  fo{t)\  -^0  as  n  oo. 

t 

Rates  of  Convergence.  We  can  provide  two  different  results  about  the  rate  of  convergence  of  the  LS 
density  estimator. 

(1.)  Bosq  and  Lecoutre^^  also  provide  convergence  rates  in  the  supremum  norm.  For  n  large  enough, 
there  exists  a  6  such  that  for  any  s  >  0 

p  sup  !/„(<)  -  fo(t)\  >  ej  <  2 exp  (^-nSe'^al/^P^  . 

This  implies 

sup  !/„(<)  -  fo(t)\  =  Op  . 

(2.)  An  analysis  similar  to  that  of  Silverman^^  or  Cox  and  O’Sullivan^®  establishes  the  following 
result. 

Let  w  =  l//o.  If  a„  — ►  0,  and  ^  oo,  then 

E  [ll/n  -  foW.]  M'] 

This  gives  a  rate  of  for  a„  (This  rate  may  be  applicable  for  fo  bounded  above 

and  away  from  0  on  compact  support.) 
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SMOOTHING  PARAMETER.  SELECTION  FOR  DENSITY  ESTIMATION 


Practically  speaking,  we  need  an  automatic  procedure  for  selecting  the  smoothing  parameter.  Cross- 
validation  is  suited  to  least-squares  problems  and  has  been  applied  to  spline  smoothing  and  other  statis¬ 
tical  estimation  and  regression  problems.  See  Wahba^®. 

The  discrete  representation  has  unconstrained  solution  /  =  Mafn  where  Ma  =  {R  +  ocD  RD)  R. 
The  generalized  cross-validation  (GCV)  criterion  for  selection  of  the  smoothing  parameter  is 


minimize 


C{a)  = 


\\iI-Mc)fn\\l 
[1^(1 -Ma)f  ■ 


The  GCV  score  C{a)  is  an  estimate  of  mean  squared  error  (Hardle^®,  Wahba^^). 

See  Figure  5  for  an  example  of  smoothing  parameter  selection  by  GCV .  A  sample  of  size  100  was 
drawn  from  a  normal  mixture  distribution.  The  different  graphs  highlighted  with  the  x’s  are  LS  estimates 
computed  using  the  indicated  values  of  cr.  The  GCV  criterion  picks  the  smoothing  parameter  which  gives 
the  graph  in  the  center  of  the  array,  with  a  =  0.0022. 

INDIRECT  OBSERVATION 

Let  ^1, . . . ,  he  i.i.d.  with  unknown  c.d.f.  Fo  and  p.d.f.  /<>.  We  observe  Xi, . . . , Xn  which  have 
p.d.f.  Qo  =  ICfo,  where  1C  is  some  known  operator.  The  empirical  functions  gn  and  Gn  are  based  on  the 
X,*.  The  penalized  LS  functional  for  estimation  of  fo  is 

J{f)  =  -  {9n,lCf)^  +  +  ^\\Vf\\l , 


where  the  “correct”  weight  is  ty  =  IflCfo. 

Continuous  Representation.  The  LS  estimator  f  satisfies  the  differential  equation 

[{K'fywK  +  aV*wV]f  =  {IC'fYwgn^ 

The  prime  denotes  Gateaux  differentiation,  and  the  asterisk  denotes  the  Hilbert-adjoint  operator.  If  1C 
is  a  linear  operator,  the  equation  becomes 

{K*wK:  +  ocV^wV)f  =  K^wgn^ 


Discrete  Representation.  For  linear  /C,  the  discrete  LS  functional  is 

3{f)  =  -glRRf  +  \fK*RKf  -1-  ^fD*RDf. 
Equivalently,  with  =  x^Ry,  we  can  take 

Jif)  =  \\Kf-9n\\l  +  oc\\DffR. 


The  LS  estimation  problem 

minimize  J(/)  subject  to  /  >  0  and  /  /  =  1 

is  the  standard  quadratic  programming  problem.  The  corresponding  matrix  equation  for  /  is 

(K^RK  +  aD*RD)f  =  K^Rgn^ 
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SMOOTHING  PARAMETER  SELECTION  FOR  THE  INDIRECT  PROBLEM 


This  is  similar  to  the  standard  density  estimation  case.  The  discrete  representation  has  unconstrained 
solution  /  =  MaPn,  where  =  {K*  RK-\-(xD*  R.  The  concept  of  generalized  cross-validation 

can  be  adapted  to  the  case  of  indirect  observation.  The  GCV  criterion  in  this  case  is 


minimize  C{oc) 


\\{I-KMM?r 

[Tr:{I-KM,)f  ' 


There  is  an  extra  K  in  the  score,  because  Mgn  is  an  estimate  of  /,  and  KMgn  estimates  g.  Note  that  g 
is  the  distribution  of  the  (observable)  data. 

We  conclude  with  two  examples  of  problems  which  fit  into  the  framework  of  density  estimation  from 
indirect  observation,  the  deconvolution  problem  and  the  corpuscle  problem. 

The  Deconvolution  Problem.  Consider  the  model 


Xi  =  Zi  +  Wi,  1  <  «  <  n 

where  the  Zi  are  i.i.d,  /  (unknown)  and  the  Wi  are  i.i.d.  k  (known).  We  observe  the  Xi  and  wish  to 
estimate  /.  The  p.d.f.  g  of  the  Xi  is  the  convolution  of  k  and  /: 

g{t)  =  [AC/](t)  =  [k*  f](t)  =  J  k{t-  x)f(x)  dx. 


See  Figure  6  for  an  example  of  estimation  and  GCV  smoothing  parameter  selection  for  the  deconvo¬ 
lution  problem.  A  sample  of  size  250  was  drawn  from  the  10/?(3,5)  distribution  and  contaminated  with 
iV'(0,4)  noise.  The  short  wide  distribution  is  the  data  density,  signal  -f  noise.  The  narrow  distribution 
is  the  signal  that  we  wish  to  recover.  The  various  smoothing  parameter  values  indicated  give  the  differ¬ 
ent  graphs  highlighted  with  x’s.  The  GCV  criterion  picks  the  version  in  the  center  of  the  array,  with 
a  =  0.00037. 

WickselPs  Corpuscle  Problem.  Spheres  with  random  radii  are  distributed  at  random  uniformly  in  a 
solid  medium.  The  sphere  radius  p.d.f.  is  /<,,  with  support  [0,/ZAf]-  A  slice  through  the  medium  gives 
data  which  are  circles,  «.e.,  sphere  -  slice  intersections.  The  circle  radius  p.d.f.  g  is  nonlinear  in 


’(  )  ■  ‘  */.(.)  dx 

Define  the  function  /  by  f{i)  =  /o(f)/  dx.  Then 

/Rhf 

-  t^)~^l'^f{x)  dx 


is  linear  in  /.  We  can  recover  /<,,  since  fo{i)  =  /(O/  f{^) 

See  Figure  7  for  an  example  of  estimation  and  GCV  smoothing  parameter  selection  for  the  corpuscle 
problem.  A  sample  was  drawn  from  the  y3(5,2)  distribution  to  represent  the  radii  of  spheres.  This 
distribution  is  the  taller,  skewed  solid  line.  It  is  the  information  we  want  to  recover  from  the  sample. 

After  slicing  the  spheres  randomly,  we  have  148  circle  radii,  which  are  the  observable  data.  Their 
density  is  the  low,  wide  curve  plotted  with  the  dotted  line.  The  various  smoothing  parameter  values 
indicated  give  the  different  graphs  highlighted  with  x’s.  The  GCV  criterion  picks  the  version  in  the 
center  of  the  array,  with  a  =  0.0014. 
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Figure  5:  GCV  for  Density  Estimation.  (f±i)  +  (f:7|),  n  =  100. 
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Figure  6:  GCV  for  the  Deconvolution  Problem.  10/3(3, 5)  +  A/'(0, 4),  n  =  250 
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PROJECTION  METHODS  FOR  GENERATING  MIXED- LEVEL 
FRACTIONAL  FACTORIAL  AND  SUPERSATURATED  DESIGNS 

Alonzo  Church,  Jr. 

Church  Associates,  Inc. 

Hudson,  Ohio  44235 

ABSIHACT 

The  definitions  of  resolution  and  projectivity  have  been  used  to  develop  an  algorithm  to  find  mixed-level  fractioral 
factorial  designs.  Some  of  the  designs  found  differ "from  standard  designs  and  have  superior  projection  properties.  In  addition 
their  least  squares  properties  are  often  superior.  The  algorithm  is  discribed  and  some  useful  alternative  designs  are  given  in 
detail. 


MTEODOCin® 

T.he  purpose  for  this  paper  is  to  discuss  four  subjects  related  to  projection  methods:  1)  computer  generation  of  mixed- 
level  designs  using  projectivity  criteria;  2)  two-level  Matiyoshka  Designs;  3)  Selerting  projectivity  =  3  subsets  from 
published  design  tables  like  136;  and  4)  projectivity  criteria  for  supersaturated  designs.  In  addition  to  the  above  other 
designs  have  been  generated  by  the  author  for  certain  inconplete  latin  squares  and  related  designs.  In  these  cases  an 
additional  criterion  was  used  for  design  evaluation.  For  incoif  lete  latin  squares  one  can  use  the  number  of  cooccurances  of  two 
treatments  in  the  same  block  and  minimize  the  sums  of  squares.  To  show  these  designs  might  prove  useful  we  present  the 
following  example. 

For  a  more  complete  developm.ent  of  projection  methods  see  Church  (1993,  1995,  1996)^’^'^. 

PROJECn®  GEBERATI®  -  M  EXAMPLE 

In  order  to  answer  questions  about  the  iifortance  of  four  factors  which  might  affect  the  wear  of  a  new  tennisball  design, 
a  production  scale  eroeriment  was  proposed.  Two  of  the  factors  were  to  be  included  at  three  levels.  These  factors  were  discrete 
settings  which  could  not  be  reduced  to  two  levels.  The  other  two  factors  were  continuous  and  two  levels  were  sufficient.  The 
full  factorial  design  would  require  36  runs  which  was  too  large  a  number  for  a  production  es^eriment.  The  factory  could 
tolerate  a  design  requiring  12  or  18  runs  but  no  more.  If  all  factors  were  discrete  the  m.Qdel  iiflied  by  figure  l  is 
appropriate.  The  problem  may  be  visualized  as  attempting  to  estimate  wear  in  36  "boxes"  as  shovn  from  the  l/3  or  1/2  this 
number. 

Figure  2  shows  the  EZDOX  output  listing  the  runs  and  design  properties  for  an  18  run  experiment.  The  projection  properties 
of  the  design  are  underlined.  Included  in  this  listing  are  some  of  the  design's  least  squares  properties.  The  output  indicates 
that  a  typical  fractional  factorial  model  requires  l  to  20  parameters  (the  main  effects  model  requires  7  while  the  two-factor 
interaction  model  requires  20) .  Thus  the  full  two-factor  interaction  model  cannot  be  estimated  in  18  runs  and  an  analysis 
appropriate  for  a  supersaturated  design  seems  appropriate. 

Some  of  the  least  squares  properties  of  the  design  are  included  in  the  output.  The  design  is  termed  "BOI-STAHDARD"  because 
it  is  not  orthogonal  for  the  main"effects  model.  Trace  efficiencies  for  the  main  effects  are  reported  as  well  as  the  variance 
average  of  the  individual  contrasts  from  udiich  these  effeciencies  are  calculated. 

The  alias  index  is  derived  from  the  alias  of  main  effects  due  to  two-factor  interactions.  A  second  output  from  the  EZDOX 
software  is  a  file  (figure  3)  which  identifies  the  alias  in  main  effects  due  to  two-factor  interactions.  The  file  contains  a 
matrix  whose  columns  are  contrasts  representing  the  main  effects  and  whose  rows  are  two-factor  interaction  contrasts.  The 
calculation  used  is  due  to  Draper  and  Smith  (1966)".  The  alias  index  (AI)  is  the  column  sum  of  squares  averaged  for  factors 
with  more  than  two  levels.  The  smaller  the  alias  index  the  less  is  the  bias  in  rain  effects  due  to  two-factor  interactions. 
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Figure  4  lists  the  designed  runs  in  standard  units  and  the  wear  response.  The  purpose  of  the  analysis  is  to  lini.'Eize  wear. 
Figure  5  shows  the  approach  used  for  this  design  to  identify  important  factors  and  two-factor  ^interacdons.  Four  tentative 
analyses  were  run  as  indicated  with  each  containing  main  effects  and  a  different  subset  of  the  interactions.  The  smaller  the 
error  mean  square  the  more  likely  the  tentative  analysis  is  to  be  “correct".  Terms  in  the  final  model  include  three  main 
effects  a.nd  a"  two-factor  interaction.  From  this  model  predictions  were  calculated  for  each  of  the  35  "boxes'  (figure  5! .  The 
minimum  wear  was  identified  and  after  experimental  verification  used  in  production. 

This  aqieriment  vas  our  first  application  of  a  design  generated  by  projection  methods. 

PROOECTIVITJ  of  S®  TOO-LEVEL  DESIGHS 

In  figure  7  we  show  projections  to  two  and  three  dimensions  of  the  standard  8  factor  16  run  desi^  of  resolution  IV  (see 
Box  and  Hunter  (1S61)^).  One  two-way  and  one  three-way  projection  are  shown.  All  56  three-way  projections  are  identical,  each 
of  the  eight  cells  has  eractly  two  runs  while  all  28  two-way  projections  have  cells  containing  four  runs  each.^Not  only  is  this 
design  of "resolution  IV  but  it  is  also  projectivity  =  3  by  a  new  criterion  proposed  by  Box  and  Tyssedal  (1932)". 

If  we  now  consider  a  12  factor,  16  run  experiment  designed  by  conventional  methods,  both  the  resolution  and  projectivity 
are  reduced  as  shown  in  figure  8.  We  note  that  the  smaller  12  run  des^ign  shown  in  figure  9  (due  to  Blackett  and  Burman 
(1946)"")  has  resolution  Ill"and  projectivity  =  3  (Box  and  Bisgaard  (1992)")  for  just  one  less  factor! 


In  figures  7,  8,  and  9  we  h^ve  shown  projections  as  if  all  variables  were  continuous.  At  two  levels  we  cannot  distinguish 
in  the  model  between  continuous  and  discrete  factors.  The  model  differences  become  apparent  when  a  factor  has  three  or  more 
levels.  It  seems  more  natural  to  represent  discrete  factors  using  tables  rather  than  graphs  and  identify  the  factor  levels 
using  integers. 

We  have  discussed  two  criteria  due  to  Box"'"  and  coworkers  for  classifying  designs,  resolution  and  projectivity.  We 
summarize  definitions  of  these  properties  as  follows: 

Design  Resolution: 

III  -  Main  Effects  are  aliased  with  2  Factor  Interactions. 

IV  -  Main  Effects  are  independent  of  2  Factor  Interactions. 

V  -  Main  Effects  are  independent  of  2  Factor  Interactions  and  2  Factor  Interactions  are  independent  of  one  another 

Design  Projectivity: 

2  -  2-way  tables  have  no  empty  cells. 

3  -  3-way  tables  have  no  empty  cells. 

4  -  4-way  tables  have  no  empty  cells. 

These  definitions  also  extend  to  higher  resolution  and  projectivity. 

In  the  Box  and  Tyssedal  paper  (1992)"  defining  projectivity  it  is  shown  that  there  exists  a  16  run  design  which  is 
projectivity  =  3  for  up  to  14  factors.  Using  computer  search  techniques  we  were  able  to  identify  the  Box-Tyssedal  design  as 
based  on  a  different  I6xi6  hadamard  matrix  discussed  by  Hall  (1961)".  Hall  has  given  a  total  of  five  I6xi6  hadaraaris  one  of 
which  leads  to  Taguchi's  L16"^  and  the  usual  series  of  two-level  fractional  factorials  discussed  in  Box  and  Hunter .  Box  and 
Tyssedal"  show  that  three  other  hadamards  lead  to  projectivity  =  3  designs  in  12  factors,  since  the  designs  are  not  given  in 
the  Box  and  Tyssedal"  paper,  we  present  the  best  of  these  in  the  appendix  (design  l  and  design  2) . 

We  have  verified  the  work  of  Box  and  T'/sse(fel  "  by  identifjdng  the  best  subsets  of  each  of  the  5  Hall  Hadamards.  Our 
results  are  summarized  in  table  1.  It  should  be  noted  that  not  all  subsets  of  these  hadamards  lead  to  this  result.  Table  1 
tabulates  the  successful  subsets  as  hits  out  of  the  number  of  possible  designs  called  combinations. 

Of  the  three  Hall  Hadamards  which  lead  to  12  Factor  projectivity  =  3  designs  one  appears  best.  This  design  we  have  named 
Matryoshka  because  of  its  nesting  projectivity.  In  the  four  a  priori  most  important  factors  the  design  is  a  full  factorial. 
Adding  the  next  four  intermediate  important  factors  results  in  the  familiar  resolution  IV  design  for  6  factors  in  16  runs. 
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Mrcshka  perndts  a  bonus  however:  four  more  factors  of  lesser  importance  can  be  added!  In  these  twelve  factors  every  three-way 
table'has  at  least  one  run  per  cell.  Thus  Matryoshka  is  projectivity  =  3.  In  addition  the  three  remaining^ degrees  of  freedcm^in 
the  design  can  be  used  to  estimate  two-factor  interaction  groups  one  containing  E  a  second  containing  AC  and  a  third 
containing  BC,  The  Matryoshka  design  is  given  in  the  ^pendix  as  Design  l. 


Table  1 

Sunjary  of  Designs  ^diich  are  Subsets  of  the 
5  Ball-Hadaiaid-Based  Designs 
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Matryoshka  can  be  extended  to  a  larger  32  run  design  for  which  24  factors  can  be  included  in  a  projectivity  =  3  design. 

It  should  be  noted  that  for  5  factors  the  standard  design  has  the  best  projection  properties.  It  is  resolution  V  and 
oroiection  =  4. 


DESI®  GENESATI®  vs  SUBSET  SELECTK® 

In  the  preceeding  two  sections  we  have  demonstrated  generating  a  mixed  level  design  and  subsetting  Eadamard  based  desi^s 
with  good  projection  properties  .  These  can  be  viewed  as  two  types  of  subsetting.  If  we  have  an  array  with  coluims  identified 
with  factors  and  rows  identified  with  runs  then  column  subsetting  can  be  used  to  the  number  of  factors  is^  less  than  the 
number  of  colamns.  It  is  practical  to  evaluate  all  possible  subsets  to  select  the  best  design  by  the  selected  criteria. 

Row  subsetting  is  usually  less  practical  because  of  the  number  of  rows  and  thus  combinations  to  be  evaluated.  Thus  the 
algorithm  used  in  EZDOI,  a  row  subsetting  program,  is  a  directed  search  frci  a  random  start.  The  directed  search  proceeds  by  a 
single  factor  level  interchange  between  two  rows.  The  better  'design'  is  kept  for  the  next  iteration. 

Our  criteria  which  can  be  used  either  in  row  or  column  subsetting  algorithms  are  as  follows: 

z  -  number  of  i-way  tables  having  at  least  one  empty  cell. 

-  Adjusted  sum  of  spared  cell  counts  over  all  i-way  table  cells. 
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z  and  q  found  define  the  best  design.  The  q  relate  to  design  resolution  while  the  z  relate  to  design  projeotiirity. 

A  significant  nisber  of  i-way  tables  require  evaluation  in  lost  practical  proble^i.s.  Table  2  lists  the  nuiBber  as  a  function 
of  factors.  While  the  number  of  tables  does  not  depend  on  the  number  of  factor  levels  the  number  of  cells  does. 

SEARCH  MHOD 

Figure  9  is  a  simplified  flowchart  for  the  projection  search  algorithm  used.  This_^  algorithm  belongs  to  the  class  of 
interchange  algorithms.  For  another  example  of  an  interchange  algorithm  see  Iguyen  (1996!^^. 

The  projection  algorithj.  used  to  generate  fractional  factorial  and  supersaturated  designs  consists  of  an  inner  iteration 
and  an  outer  iteration.  The  inner  iteration  uses  exclusively  the  projection  criteria.  The  outer  iteration  adds  criteria 
appropriate  to  the  design  type. 

mm  iTBRATid 

The  process  begins  with  a  random  starting  design  subject  to  the  constraints  that  the  design  size  is  fixed  and  the  number 
of  factors  is  fixed  as  well  as  the  number  of  levels  per  factor.  It  is  also  a  constraint  that  each  level  of  a  factor  occur 
equally  often.  For  this  start  the  projection  criteria  are  calculated. 

Next  an  interchange  between  two  rows  of  a  randomly  chosen  factor  is  performed  subject  to  the  constraint  that  the  two  rows 
differ  in  level  of  the  chosen  factor.  For  this  interchange  the  projection  criteria  are  calculated.  Should  the  projection 
criteria  be  better  than  the  existing  design,  the  intercha.nge  rows  replace  the  original  rows. 

This  interchange  process  is  repeated  a  large  number  of  times  insuring  that  all  row  pairs  and  all  factors  are  included  in  the 
interchanges  multiple  times. 

ODTER  ITERATM 

The  outer  iteration  compares  the  result  of  the  inner  iteration  with  previous  best  inner  iterations  and  retains  the  best  of 
the  best.  In  the  outer  iteration  projection  criteria  are  used  in  addition  to  other  suitable  criteria.  For  supersaturated 
designs  the  other  criteria  include  the  maximum  jrj  am.ong  the  factors.  Also  included  are  average  jrj  and  DetjEE'i.  For  other 
fractional  factorial  designs  maximum  |r|  between  main  effects  and  two-factor  interactions  are  ir.cluded.  Also  included  is  alias 
index. 


DESIGN  GENERATION  and  COBPARIS® 

In  tables  3  and  4  we  present  a  comparison  of  some  projection  generated  designs  with  the  accepted  standard  designs  where 
one  is  known.  As  the  accepted  standard  designs  we  have  used  the  two-level  designs  of  Box  and  Hunter  (19611^,  the  orthogonal 
arrays  used  by  Taguchi  (1987)^^  and  designs  used  in  the  ENITAB  software.  Table  3  contains  the  comparisons  and  table  4  contains 
som.e  designs  for  which  no  standard  was  available  to  the  author.  The  reference  column  of  the  tables  indicates  where  the  design 
details  can  be  found.  If  no  reference  is  given  the  details  are  in  the  Author's  database.  This  database  was  generated  using 
EZDOX  software.  The  complete  database  contains  orthogonal  designs  with  factors  having  2  to  16  levels  and  up  to  36  runs.  Within 
the  limits  of  the  software  the  database  is  coiflete  for  up  to  seven  factors.  Beyond  seven  factors  some  scattered  designs  are 
included. 


CBIiDHN  SUBSETS  of  Ii36 

Taguchi  (1987)“  has  proposed  the  use  of  36  run  designs  in  two  and/or  three  level  factors  based  on  a  saturated  orthogonal 
array  which  can  accommodate  up  to  11  two-level  and  12  three-level  factors.  We  used  this  design  to  determine  what  the 
possibilities  are  for  projectivity  =  3  designs  wdien  six  or  fewer  three-level  factors  are  to  be  combined  with  two-level  factors 
in  an  e}qieriment.  We  further  ask  if  a  better  alternative  exists. 
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Tteble  4 

Miitional  Projection  Generated  Designs 


Tables 

A  Comparison  of  Designs 

Accepted  Standard  with  Projection  Generated  Designs 


N 

Factor  Levels 

Type 

1 

z 

3 

1 

z 

4 

AI 

Eef 

16 

12@2 

Std 

512 

16 

2928 

183 

.250 

PGD 

512 

0 

2928 

327 

.250 

1 

16 

14§2 

CO 

696 

28 

6160 

385 

.260 

PGD 

896 

0 

6160 

749 

.260 

2 

16 

4§2  1@4 

Std 

16 

1 

l' 

4 

.149 

PGD 

16 

2 

1 

4 

.149 

3 

36' 

11@2  1@3 

Std 

2641 

0 

13861 

330 

.244 

PGD 

1 

0 

3301 

0 

.081 

4 

36 

8@2  3@3 

Std 

1126 

0 

4411 

168 

.169 

PGD 

661 

0 

2863 

213 

.132 

6 

13 

Factor  Levels 

^3 

z 

3 

s 

16 

4@2 

1@8 

1 

6 

1 

20 

6@2 

1@5 

81 

10 

63 

24 

6@2 

1@6 

104 

5 

33 

24 

6@2 

1@12 

1 

15 

1 

36 

8@2 

3@3 

661 

0 

2863 

36 

6@2 

4@3 

605 

0 

2225 

36 

11@2 

2@3 

917 

0 

5493 

z 

4 

AI  Eef 

4 

.230 

35 

.186 

22 

.142 

20 

.188 

213 

.132 

172 

.131 

283 

.137  5 

Table  5  shows  the  scope  of  an  exhaustive  subsetting  of  L36.  Shown  are  the  number  of  optimal  occurances  (hits)  and  the 
number  of  combinations  requiring  evaluation.  Clearly  optimal  is  a  rare  event  except  in  the  case  of  one  three-level  factor.  We 
have  defined  the  optimal  occurance  to  be  a  projection  =  3  design  with  best  levels  of  the  other  projection  properties.  Kot  only 
did  we  calculate  the  best  L35  subsets  but  we  also  conducted  a  search  to  see  if  designs  better  than  the  L36  subsets  could  be 
found.  In  every  case  studied  a  better  design  was  found.  These  better  designs  have  been  included  in  tables  2  and  3. 


Table  5  Table  5 

Best  Subsets  of  Design  L36  Small  Hiied-Level  Supertsaturated  Pffls  with  |rl<.34 

by  Projection  Criteria 


Factor 

Levels 

Hits 

Combinations 

Ho. 

Runs 

Factors 

4's  3’s  2's 

2-way 
z2  z22  q2 

3-way 
z3  q3 

Referei 

Design 

6 

0 

1 

4 

0 

0  1 

10 

1 

7 

11®2 

1®3 

12 

12 

12 

0 

1 

18 

0 

85  340 

562 

2925 

8 

10@2 

283 

6 

726 

1 

0 

18 

0 

76  305 

558 

2491 

9 

8@2^ 

3@3 

4 

36300 

1 

1 

9 

0 

9  37 

71 

289 

10 

6®2 

4®3 

1 

228690 

0 

1 

29 

0 

96  673 

682  18555 

11 

3@2 

5®3 

1 

130680 

0 

2 

22 

0 

59  355 

431 

8075 

12 

MQED-IiEVEL  SDPESSATDIiATED  DESKBIS 


A  supersaturated  design  is  a  screening  design.  It  is  appropriate  when  a  small  number  of  the  proposed  factors  are  active.  A 
good  rule"  is  that  this  sell  number  should  be  less  than  half  the  design  size.  In  such  a  design  situation  the  need  for  more  than 
two  levels  in  a  factor  can  occur  when  the  factors  are  discrete.  It  is  not  unreasonable  to  include  a  small  number  of  such  more 
than  two- level  factors  in  a  design. 

Lin  (1S93  and  1995)^°'“  gives  construction  methods  for  some  two-level  supersaturated  designs  and  proposes  a  criterion  for 
useful  si^ersaturated  designs.  He  suggests  that  no  two  columns  of  such  designs  should  have  correlation  greater  in  absolute 
value  than  0.34.  To  compute  the  correlation  among  factors  in  the  design  it  is  necessary  to  model  the  factors  which  have  more 
than  two  levels  in  such  a  way  that  all  degrees  of  freedom  are  included.  For  this  purpose  we  have  used  the  orthogonal  contrasts. 

Modification  of  EZDOX  was  required  to  obtain  the  supersaturated  designs  presented  here.  Only  z^,  q^_,  z^,  q^  were  used. 
However  it  was  found  necessary  to  define  a  supplimental  criterion  to  account  for  correlation.  This  new  criterion  is  a  measure 
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of  imbalancs  of  two-ray  tables  formed  from  all  pairs  of  factors.  It  is  defined  as: 

z  -  number  of  unbalanced  2-ray  tables  not  having  a  zero  cell 

22 

Tables  which  cannot  be  balanced  are  counted  when  the  imbalance  exceeds  the  best  which  can  be  expected. 

In  this  feasibility  study  only  small  n  with  one  or  two  factors  at  three  and/or  four  levels  were  studied.A  summary  of  the 
designs  found  is  given  in  table  6. 

?.  listing  of  each  design  is  included  in  the  appendix. 
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APPENDIX 

Projection  Generated  Designs  Discussed  in  the  Text 


Projection  3  or  4  Orthogonal  Designs  vdiich  are  better  than  SIANDASD  Designs: 


Desion  l  -  U  Runs,  12  Factors  at  2  Levels, 
Projection  =  3: 

F  1:  1  1  1  1  1  i  1  1  2  2  2  2  2  2  2  2 
F  2;  1  1  1  1  2  2  2  2  1  1  1  1  2  2  2  2 
F  3:  1  1  2  2  1  1  2  2  1  1  2  2  1  1  2  2 
F  4:  1  1  2  2  2  2  1  1  2  2  1  1  1  1  2  2 
F  5:  1  2  i  2  1  2  1  2  1  2  1  2  1  2  1  2 
F  6:  1  2  1  2  2  1  2  1  2  1  2  1  1  2  1  2 
F  7:  i  2  2  1  i  2  2  1  2  1  1  2  2  1  1  2 
F  8:  1  2  2  1  2  1  1  2  1  2  2  1  2  1  1  2 
F  9:  2  1  2  1  2  1  1  2  2  1  1  2  1  2  1  2 
FIO:  2  1  2  1  1  2  2  1  1  2  2  1  1  2  1  2 
Fll:  2  1  1  2  2  1  2  1  1  2  1  2  2  1  1  2 
F12:  2  112  12  12  2  12  12  112 

Design  2  -  16  Runs,  14  Factors  at  2  Levels, 
Projection  =  3: 

F  1;  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2 
F  2:  1  1  1  1  2  2  2  2  1  1  1  1  2  2  2  2 
F  3:  1  1  2  2  1  1  2  2  1  1  2  2  1  1  2  2 
F  4:  1  1  2  2  2  2  1  1  2  2  1  1  1  1  2  2 
F  5:  2  2  1  1  1  2  1  2  1  2  1  2  1  1  2  2 
F  6:  2  2  1  1  2  1  2  1  2  1  2  1  1  1  2  2 
F  7:  1  2  1  2  2  2  1  1  1  1  2  2  1  2  1  2 
F  8:  1  2  1  2  1  1  2  2  2  2  1  1  1  2  1  2 
F  9:  2  1  2  1  1  2  2  1  2  1  1  2  1  2  1  2 
FIO:  2  12  12  112  12  2  112  12 
Fll:  2  1  1  2  2  1  2  1  1  2  1  2  2  1  1  2 
F12:  2  112  12  12  2  12  12  112 
F13:  1  2  2  1  2  1  1  2  2  1  1  2  2  1  1  2 
F14:  1  2  2  1  1  2  2  1  1  2  2  1  2  1  1  2 


Design  5  -  36  Runs,  13  Factors,  2  ®  3  Levels,  ii  @  2  Levels, 

Project ion  =  3: 

F  1;  1  1  1  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  3  3  3 

F  2-  1  1  1  1  2  2  2  2  3  3  3  3  1  1  1  1  2  2  2  2  3  3  3  3  1  1  1  1  2  2  2  2  3  3  3  3 

F  3:  1  1  2  2  1  1  2  2  1  1  2  2  1  1  2  2  1  1  2  2  1  1  2  2  1  1  2  2  1  1  2  2  1  1  2  2 

F 4: 121212221211121212111222121212121212 

2  2  2  2  2  1  1  1  1  2  1  i  2  1  1  1  1  H  2  1  2  1  2  1  1  2  1  2  2  n  2 

F  7i  m  1  n  2  1  1  2  1  2  1  1  2  2  1  2  1  2  2  2  1  1  2  2  1  1  2  1  1  2  1  1  2  2 

2  1  2  2  2  1  2  1  2  1  1  1  1  1  2  1  2  2  2  1  1  2  2  1  1  2  1  2  1  h  1  2  i  2  2 

FIO:  2  1  1  1  1  2  2  1  2  1  2  2  1  2  2  2  1  1  2  2  1  2  1  1  2  1  1  2  2  2  1  1  1  2  1  2 

Fll:  2  2  1  2  2  1  1  2  1  1  1  2  1  2  2  1  1  2  2  1  2  1  1  2  1  1  2  1  2  2  1  1  1  2  2  2 

F12:  1  2  1  2  1  2  2  2  1  1  1  2  2  1  2  1  2  1  2  1  2  2  1  1  2  1  1  2  1  1  1  2  2  2  2  1 

F13:  2  2  1  1  1  1  2  2  1  2  2  1  1  1  1  2  2  2  2  1  2  1  2  1  2  1  2  2  2  1  1  1  2  1  1  2 

Design  6  -  36  Runs,  li  Factors,  3  ®  3  Levels,  8  ®  2  Levels, 

F  1:  1  1  nTfn  1  1  1  1  2  2  2  2  2  2  2  2  2  2  2  2  3  3  3  3  3  3  3  3  3  3  3  3 
F  2:  1  1  1  1  2  2  2  2  3  3  3  3  1  1  1  1  2  2  2  2  3  3  3  3  1  1  1  1  2  2  2  2  3  3  3  3 

F  3:  1  1  2  3  1  2  2  3  1  2  3  3  1  2  3  3  1  1  2  3  1  2  2  3  1  2  2  3  1  2  3  3  1  1  2  3 

F  i:  1  2  1  2  2  1  2  1  1  2  1  2  2  2  1  1  1  2  1  2  2  1  2  1  1  1  2  2  1  2  1  2  1  2  1  2 

F  5:  1  2  2  1  1  2  2  1  2  1  1  2  1  2  1  2  2  2  1  1  1  1  2  2  1  2  1  2  1  1  2  2  2  2  1  1 

M:  1  2  n  1  2  1  1  1  2  2  2  1  2  2  1  1  2  2  1  2  1  1  2  2  2  1  1  2  2  1  2  2  1  1  1 

F  8:  2  1  1  2  2  2  1  1  2  1  2  1  1  1  1  2  1  2  2  2  2  1  2  1  n  2  2  1  1  1  2  2  1  2  1 

m!  2  2  1  1  1  2  2  2  1  1  2  1  1  n  1  1  2  2  n  2  2  1  1  2  2  2  1  1  1  2  2  1 


Design  3  -  16  Runs,  5  Factors  1  @  4  Levels,  4  @  2  Levels, 
Projection  =  3: 

F  1:  1  1  1  1  2  2  2  2  3  3  3  3  4  4  4  4 
F  2:  1  1  2  2  1  1  2  2  1  1  2  2  1  1  2  2 
F  3:  1  1  2  2  1  2  1  2  1  2  1  2  2  2  1  1 
F  4:  1  2  1  2  2  1  1  2  1  2  2  1  1  2  1  2 
F  5:  2  1  1  2  2  2  1  1  1  1  2  2  1  2  2  1 


Desion  4  -  36  Runs,  12  Factors,  1  @  3  Levels,  il  ®  2  Levels, 
Projection  =  4: 

1 1 1 


F  1: 
F  2: 
F3: 
F  4 
F  5 
F  6 
F  7 
F  8 
F  9 
FIO 
Fll 
F12 


Hi  = 

111111111111222222222222333333333333 
111111222222111111222222111111222222 


122121121221211212212112122121121221 

121221221112122211211212121221221112 
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Sijpersatmted  Designs  with  correlation  among  factors  less  than  1/3: 

Desior.  7  -  6  Runs,  5  Factors,  l  @  3  Levels,  4  @  2  Levels:  Desicm  ll  -  18  Ms,  30  Factors,  l  @  3  Levels,  29  ®  2  Levels: 
F  11  1  1  2  2  3  3  F  1:  2  3  3  3  2  1  3  1  1  1  2  2  2  1  1  3  3  2 

F  2:  1  2  1  2  1  2  F  2:  2  1  1  2  i  2  1  1  2  2  2  1  2  2  1  1  2  1 


F  3:  2  1  2  1  i  2 
F  4:  1  2  2  1  1  2 
F  5:  2  1  1  2  1  2 

Desion  8  - 12  Ms,  19  Factors,  i  @  3  Levels,  18  ®  2  Levels: 
F  1:  3  3  1  1  3  3  1  2  2  2  2  1 

F  2:  2  2  2  2  1  1  1  1  2  1  2  1 

F  3:  2  1  2  2  2  1  1  1  1  2  2  1 

F  4:  2  1  1  2  2  1  2  1  2  1  2  1 

F  5:  2  1  1  2  1  2  2  1  2  2  1  1 

F  6:  1  2  1  2  1  2  2  1  1  2  2  1 

F  7:  1  2  2  2  2  1  1  2  2  1  1  1 

F  8:  1  1  2  2  2  2  1  2  1  1  2  1 

F  9:  1  2  1  2  2  1  2  2  1  1  2  1 

FIO:  1  2  2  2  1  2  1  1  2  2  1  1 

Fll:  2  2  2  1  1  1  1  1  1  2  2  2 

F12:  2  2  2  1  1  1  2  2  2  1  1  1 

F13:  112  12  2  112  12  2 

F14:  12  12  12  112  12  2 
F15:  2  2  12  1112  112  2 
F16:  12  112  12  12  12  2 
F17:  2  112  12  12  2  112 
F18:  2  1  2  1  1  2  2  2  1  1  2  1 
F19:  2  112  2  112  12  12 

Design  9  -  12  Runs,  19  Factors,  i  @  4  Levels,  18  ®  2  Levels: 
F  1:  3  2  4  3  4  2  2  1  3  1  1  4 

F  2:  1  2  2  2  1  1  1  1  2  1  2  2 

F  3:  2  1  2  2  1  2  1  1  1  2  1  2 

F  4;  2  2  1  1  1  2  1  2  1  1  2  2 

F  5:  2  2  1  1  1  1  2  1  2  2  1  2 

F  6:  2  2  2  1  2  1  1  1  2  1  2  1 

F  7:  2  2  1  2  2  2  1  1  1  1  2  1 

F  8;  1  2  2  2  1  1  2  2  1  1  1  2 

F  9:  2  1  1  1  2  2  1  2  2  1  1  2 

FIO;  2  12  112  2  1112  2 
Fll:  2  2  2  1  1  1  1  1  1  2  2  2 

F12:  2  2  2  1  1  1  2  2  2  1  1  1 

F13:  112  112  12  2  12  2 
F14:  1  2  1  1  1  2  1  1  2  2  2  2 
F15:  2  2  112  112  12  12 
F16:  2  1  2  1  1  2  1  2  2  2  1  1 
F17:  2  112  112  2  112  2 
F18:  2  12  12  12  112  2  1 
F19:  2  2  2  2  1  2  1  2  1  1  1  1 

Design  10  -  12  Runs,  ll  Factors, 

1  @  4  Levels,  1  @  3  Levels,  9  ®  2  Levels: 

Fl:  321312213123 
F  2:  2  3  1  1  2  4  1  3  4  4  2  3 

F  3:  2  2  2  1  1  1  1  1  1  2  2  2 

F  4:  2  2  2  1  1  1  2  2  2  1  1  1 

F  5:  1  2  2  1  2  2  1  1  2  1  1  2 

F  6:  2  1  2  1  1  2  1  2  2  1  2  1 

F  7:  1  1  1  1  2  1  2  2  2  1  2  2 

F  8:  1  1  2  2  1  2  1  2  1  1  2  2 

F  9:  2  1  2  1  2  2  2  1  1  1  1  2 

FIO:  2  1  1  1  1  2  2  2  1  2  1  2 
Fll:  1  1  2  1  1  1  2  1  2  2  2  2 


F  3;  1  2  2  1  1  2  1  1  2  1  2  1  2  2  2  2  i  1 

F  4:  1  2  2  2  2  2  1  1  1  1  1  2  1  2  2  1  2  1 

F  5:  2  1  2  1  1  2  2  1  1  1  2  1  1  2  2  1  2  2 

F  6:  1  1  1  2  1  2  2  1  2  2  1  2  1  1  1  2  2  2 

F  7:  2  1  2  2  2  1  1  1  1  2  1  2  1  2  1  2  1  2 

F  8:  1  1  2  2  1  1  2  1  2  2  1  2  2  2  2  1  1  1 

F  9:  2  1  2  1  1  1  2  2  1  1  2  2  1  1  2  2  2  1 

FIO:  2  112  2  12  12  1112  12  12  2 
Fll:  2  1  2  2  1  2  2  2  1  1  1  2  2  1  1  1  2  1 

F12:  2  2  1  1  1  2  1  2  1  2  1  2  2  1  1  2  2  1 

F13:  1  2  1  1  2  1  2  1  2  1  1  2  2  2  1  2  2  1 

F14:  1  2  1  2  1  1  2  2  1  2  2  1  2  1  2  1  1  2. 

F15:  1  1  2  2  1  1  1  2  1  2  2  1  2  1  2  2  2  1 

F16:  2  1  1  1  1  2  2  1  2  2  1  2  2  1  2  2  1  1 

F17:  2  2  2  1  1  1  1  2  2  2  1  1  1  2  1  1  2  2 

F18:  1  1  2  1  2  2  1  1  1  2  1  2  2  1  2  1  2  2 

F19:  1  1  1  2  1  2  2  2  1  1  2  2  1  2  1  2  1  2 

F20:  1  1  2  1  2  1  2  2  2  1  2  2  2  2  1  1  1  1 

F21:  1  1  1  2  2  1  1  2  2  2  2  2  1  1  2  1  2  1 

F22:  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2 

F23:  1  1  1  1  2  2  2  2  2  1  1  1  1  1  2  2  2  2 

F24:  1  1  2  2  1  1  1  2  2  1  1  1  2  2  1  2  2  2 

F25:  2  2  1  2  1  1  1  2  2  1  1  2  1  2  2  1  1  2 

F26:  2  2  1  2  1  1  1  1  2  1  2  2  1  1  2  2  2  1 

F27:  1  2  2  1  1  1  2  1  2  2  2  2  1  1  1  1  2  2 

F28:  1  2  1  1  1  1  2  2  1  1  1  2  2  2  2  1  2  2 

F29:  1  2  2  2  1  2  1  1  2  1  2  2  2  1  1  1  1  2 

F30:  2  1  2  1  1  1  1  2  2  1  1  2  2  1  2  2  1  2 

Desion  12  -  18  Runs,  24  Factors,  2  ®  3  Levels,  22  ®  2  Levels: 
F  1:  2  1  2  3  1  3  1  3  2  1  2  2  3  2  3  1  3  1 

F  2:  3  1  1  2  1  2  1  3  2  1  3  2  3  3  2  3  1  2 

F  3:  2  1  2  1  2  2  2  1  1  1  2  1  2  1  2  1  2  1 

F  4:  1  2  1  2  1  2  2  1  1  1  2  2  2  2  1  1  2  1 

F  5;  2  1  1  1  2  2  1  1  2  1  2  2  1  1  1  2  2  2 

F  6:  1  2  2  2  2  1  1  1  1  1  2  1  1  2  2  1  2  2 

F  7:  2  1  2  1  2  2  1  1  1  2  1  2  1  2  2  1  1  2 

F  8:  2  1  2  2  1  2  1  1  1  2  1  1  2  1  1  2  2  2 

F  9:  2  1  1  2  1  2  2  2  1  2  2  1  1  1  2  1  1  2 

FIO:  1  2  2  1  1  2  1  1  2  2  2  1  2  1  2  2  1  1 

Fll:  1  1  2  2  1  2  1  1  2  2  2  1  1  2  1  1  2  2 

F12:  1  1  2  1  1  1  2  2  1  2  2  2  2  1  2  1  1  2 

F13:  2  1  1  2  1  1  2  1  2  2  2  1  2  2  1  1  1  2 

F14:  1  2  1  1  2  2  1  2  2  2  2  1  2  1  1  1  1  2 

F15:  1  1  1  2  2  1  1  2  1  2  2  2  1  1  2  2  2  1 

F16:  1  1  1  1  1  1  1  1  1  2  2  2  2  2  2  2  2  2 

F17:  1  1  1  1  1  2  2  2  2  1  1  1  1  2  2  2  2  2 

F18;  1  1  2  2  2  1  1  2  2  1  1  2  2  1  1  1  2  2 

F19:221221  1  111  1  1222212 
F20:  1  1  2  2  2  1  2  1  2  1  2  1  2  1  2  2  1  1 

F21:  1  1  2  1  2  2  2  2  1  1  1  1  2  2  1  2  1  2 

F22;  1  1  2  2  2  2  1  2  1  2  2  1  1  2  1  2  1  1 

F23:  1  1  1  2  2  2  2  1  2  2  1  2  2  2  1  1  1  1 

F24:  1  2  2  2  1  2  2  1  1  1  2  2  1  1  1  2  1  2 
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